Saturday, September 18, 2010

Don’t Do What Johnny Don’t Does

You know that sense of relief that you get when you stop banging your head against a brick wall? Well, I experienced that this week and I thought I’d share, just in case there are others out there that are going through the same.

I was setting up a SharePoint 2010 environment and configuring it to index PDF documents. This is still a manual task in SharePoint 2010 as the components are not built in to the platform. However, it’s not that hard. If you go about it the right way that is.

You should already know that SharePoint 2010 will only run on a 64-bit operating system. Therefore you are going to need a 64-bit PDF iFilter to allow SharePoint to index the PDF content. There are a few on the market, some free, some you pay for. In this case the client decided to use the free Adobe PDF iFilter. Not a problem. I installed Adobe Acrobat Reader 9 (which has an iFilter built in) on the SharePoint server, I made a bunch of changes to the registry, I updated Central Admin, I rebooted a half dozen times, I lit incense sticks, performed full indexes another half dozen times yet could still not get PDFs indexed.

Blog post comments indicated that others were experiencing similar issues. The blog article would explain how easy it all was, then a few of the commenters would say that they just could not get it working.

The moment of enlightenment came when I realized that there is a difference between the iFilter built in to the Adobe Reader 9 application and the separate download that Adobe have for the 64-bit PDF iFilter. THAT is what I needed. I was wondering why the installation directory for the Reader was pointing to c:\program files (x86). Sure, this would all work fine on a 32-bit platform, but when you are on 64-bit, then the Reader app won’t cut it no more.

So I uninstalled the Reader app, installed Adobe’s 64-bit iFilter, and Microsoft Bob became my uncle. There is one registry setting that I needed to make (only one)

  1. Launch RegEdit on the SharePoint index server
  2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension
  3. Create a new key called “pdf”
  4. Set the (Default) value for the key to {E8978DA6-047F-4E3D-9C78-CDBE46041603}

If you look at the other registry keys in this section, the (Default) value type should be REG_MULTI_SZ. However, I couldn’t figure out how to change the type for the (Default) value in my new “pdf” key. I got round this by exporting one of the other keys, editing it in Notepad and then importing it. I’m tricky.

Don’t forget that you also need to add the PDF extension to the list of File Types that SharePoint should index. You’ll find this under the management page of the Search Service Application.

I may have restarted the SharePoint Server Search Service (net stop osearch14, net start osearch14), I can’t recall.

After that, just run a full index of your SharePoint 2010 content and you should be good to go. As a side note, I’ve heard that you can’t configure PDF indexing on SharePoint Foundation Server (what you might call WSS 4.0). I haven’t tried, so can’t confirm or deny that.