Wednesday, May 17, 2006

Search me

This is another one of those "lessons I've learnt" posts, that hopefully save someone else the time that I lost. Based on the rules of Karma, eventually I should be able to reclaim my lost time - if not in this life, maybe in the next.

SharePoint Portal Server has a set of noise word files that it uses to exclude certain terms from it's index. This is to ensure that the indexes don't get cluttered up with terms that no-one searches for like "the", "and", "it".

Now you can edit these noise word files to either include or exclude words. They are just text files with each noise word on it's own line.

Imagine my surprise when I edited my noise word files to remove a word ("re"), but my portal search did not return my documents. Here's what I learnt in the process of fixing this problem up:

1. There are a set of noise word files per portal - not one set for the whole server farm. If you edit the wrong set, it doesn't matter how often you search your portal, you won't get the results you expect.

2. You need to edit all appropriate noise word files. These are language specific. There are three files that are relevant to english content - noiseeng.txt, noiseenu.txt and noiseneu.txt. Make sure you make the appropriate changes to ALL the files.

3. If you want to retrieve old content using newly-excluded noise words, reset and rebuild the content indexes. Otherwise SharePoint will only return items added.

The noise word files are located in a portal-specific sub directory under \Program Files\SharePoint Portal Server\data\applications\ on your SharePoint indexing server. To find out the specific sub directory for your portal, follow these steps:
  1. From the home page of the portal, click Site Settings in the top right corner
  2. Click Configure Search and Indexing
  3. In the Content Indexes section, click on any of the entries
  4. Note the directory path specified in Local Address. Ignore the part after the long code.
  5. Look in the config folder under the folder you identified in the previous step.
For more information on noise words, see:
http://support.microsoft.com/default.aspx?scid=kb;en-us;837847#E0ZF0ACAAA