SharePoint Search: processing filenames containing underscores

Posted by Todd Owen on Server Fault See other posts from Server Fault or by Todd Owen
Published on 2010-06-07T09:45:55Z Indexed on 2010/06/07 9:53 UTC
Read the original article Hit count: 438

Filed under:
|
|

We use SharePoint Server 2007 to allow employees to search network file shares, but it seems that underscores in filenames are not treated as word separators when indexing the files.

As a result, a search for chocolate will:

  • match "chocolate milkshake.doc"
  • but not match "chocolate_cake.doc"

(Of course, this is a simplified example; in practice the content of the second file might include the word "chocolate" and match on that instead of the filename. But the problem itself is real enough, because a common scenario in a corporate environment is that a user knows the the partial name of the file they are looking for and expects to see matching filenames at the top of the search results. And using underscores in filenames is a widely used convention within our company).

Underscores are not treated as word separators in the file content either, although this is less of a concern for us. The root cause of this problem is possibly related to the behaviour of the word breakers that SharePoint uses (i.e. the language-specific DLLs that implement the IWorkBreaker interface), although I haven't confirmed this yet.

Does anyone know of a workaround for this issue? I have tested with Search Server 2008 Express too (which is based on the same technology), and it is also affected. I do not know whether the problem is fixed in SharePoint 2010 or not.

© Server Fault or respective owner

Related posts about sharepoint

Related posts about sharepoint2007