Search Results

Search found 1 results on 1 pages for 'pk188'.

Page 1/1 | 1 

  • Clustering Strings on the basis of Common Substrings

    - by pk188
    I have around 10000+ strings and have to identify and group all the strings which looks similar(I base the similarity on the number of common words between any two give strings). The more number of common words, more similar the strings would be. For instance: How to make another layer from an existing layer Unable to edit data on the network drive Existing layers in the desktop Assistance with network drive In this case, the strings 1 and 3 are similar with common words Existing, Layer and 2 and 4 are similar with common words Network Drive(eliminating stop word) The steps I'm following are: Iterate through the data set Do a row by row comparison Find the common words between the strings Form a cluster where number of common words is greater than or equal to 2(eliminating stop words) If number of common words<2, put the string in a new cluster. Assign the rows either to the existing clusters or form a new one depending upon the common words Continue until all the strings are processed I am implementing the project in C#, and have got till step 3. However, I'm not sure how to proceed with the clustering. I have researched a lot about string clustering but could not find any solution that fits my problem. Your inputs would be highly appreciated.

    Read the article

1