Wednesday, December 1, 2010

View Duplicates option in search results page is not appearing in SharePoint site

I feel it’s imperative to share some notes collected while making view duplicates option available in my environment. While gooooogling, I hardly find any article which explains this scenario and at the same time, I do feel, it’s really need for an hour to help community users to overcome this problem by following simple steps.

Let me give you some inputs here to make you understand better, think for a while and go ahead and implement to see the desired functionality available to make use of.
A question here to understand how/when “View Duplicates” appears in the search result page.
How will be the duplicate document identified in search results?
Document similarity for purposes of identifying duplicates is based only on a hash of the content of the document.  No File properties (e.g. file name, type, author, create and modify dates) are input to this hash.  The MSSDuplicateHashes table in the SSP’s search database holds, for each document, all the 64bit hashes necessary to determine if one document is a near-duplicate of another.  This is read while doing a search if duplicate collapsing is enabled.

Here is my environment detail; I am running on Windows Server 2008 R2, obviously 64 bit platform with MOSS 2007 standard version installed. One Index server, one WFE and a SQL 2008 DB server. I have a site collection based on a blank site template with couple of document libraries where hundreds of PDFs uploaded in different levels of folders. I can say, a same PDF is been uploaded in 4 level of folders. In this scenario. If you made a search on any PDF, it is expected to bring a searched PDF in search result page along with an option down “View Duplicates”. When user clicks on view duplicates link, it should bring the four different URLs in the search result page. This is how, the search view duplicates functionality works. But, whereas in my case, the search is been pulled out with all four different URLs in a search result page without “View Duplicates” option.
Consider this workaround for only 64 bit OS.Also refer to the article here which explains the indexing PDF files in SharePoint. My notes are also collected from this article to make the view duplicates option available in search results.
These are the steps taken to get the View Duplicates option appears in the search result page.
1.    Install 64 bit iFilters in Index server(Ignore if the iFilters is already installed)
2.    Ensure the Data key for .PDF file type is as shown in below image.

Start à Run à Regedit à \\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

If you see the Data Key for PDF is different, then go ahead and replace the key as below
{E8978DA6-047F-4E3D-9C78-CDBE46041603}
3.    Reboot the server
4.    Initiate the full crawl on desired content source and wait until it gets complete.
5.    Search the PDF and verify the view duplicates option appears.

If you landed in a similar scenario, try the above steps and let me know how it goes!!
Feel free to post any comments you may have.