SharePoint Search: processing filenames containing underscores
Posted
by Todd Owen
on Server Fault
See other posts from Server Fault
or by Todd Owen
Published on 2010-06-07T09:45:55Z
Indexed on
2010/06/07
9:53 UTC
Read the original article
Hit count: 438
We use SharePoint Server 2007 to allow employees to search network file shares, but it seems that underscores in filenames are not treated as word separators when indexing the files.
As a result, a search for chocolate will:
- match "chocolate milkshake.doc"
- but not match "chocolate_cake.doc"
(Of course, this is a simplified example; in practice the content of the second file might include the word "chocolate" and match on that instead of the filename. But the problem itself is real enough, because a common scenario in a corporate environment is that a user knows the the partial name of the file they are looking for and expects to see matching filenames at the top of the search results. And using underscores in filenames is a widely used convention within our company).
Underscores are not treated as word separators in the file content either, although this is less of a concern for us. The root cause of this problem is possibly related to the behaviour of the word breakers that SharePoint uses (i.e. the language-specific DLLs that implement the IWorkBreaker interface), although I haven't confirmed this yet.
Does anyone know of a workaround for this issue? I have tested with Search Server 2008 Express too (which is based on the same technology), and it is also affected. I do not know whether the problem is fixed in SharePoint 2010 or not.
© Server Fault or respective owner