Detecting similar words among n text documents

Posted by javanes on Stack Overflow See other posts from Stack Overflow or by javanes
Published on 2010-03-18T09:23:53Z Indexed on 2010/03/18 12:31 UTC
Read the original article Hit count: 270

Filed under:

datamining

|

Patterns

|

string-similarity

|

similarity

Hi;

I have n documents and want to find common words that are included in these documents. For example I want to say (n-3) documents include the word "web".

Certainly I can do this by basic data structures but there maybe efficient algorithm or a way to handle same words with different suffix. Is there any algorithm for such purposes?

I am unfamiliar with datamining world. In general manner is there a term used for efforts of finding similarities between different documents? If there is then I will make my research easily.

Thanks.

© Stack Overflow or respective owner

Related posts about datamining

Best DataMining Database

as seen on Stack Overflow - Search for 'Stack Overflow'
I am an ocasional Python programer who only have worked so far with MYSQL or SQLITE databases. I am the computer person for everything in a small compamy and I have been started a new project where I think it is about time to try new databases. Sales departament makes a CSV dump every week and I… >>> More
DataMining / Analyzing responses to Multiple Choice Questions in a survey

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have a set of training data consisting of 20 multiple choice questions (A/B/C/D) answered by a hundred respondents. The answers are purely categorical and cannot be scaled to numerical values. 50 of these respondents were selected for free product trial. The selection process is not known.… >>> More
Datamining on a mysql database

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I Begin with textmining. I have two database tables with thousands of data.. a table for "skills" and a table for "skills categories" every "skill" belongs to a skills categorie. a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill… >>> More
Datamining library for .NET

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, Does anybody know about any dataming libraries for .net? >>> More
Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in… >>> More

Related posts about Patterns

Design patterns: when to use and when to stop doing everything using patterns

as seen on Programmers - Search for 'Programmers'
This question arises due to comment of FredOverflow in my previous post. Design pattern used in projects I am quite confused by the comment. I know design pattern help in making code reusable and readable (may lack in efficiency a bit). But when to use design patterns and most importantly when… >>> More
Java Regexp patterns have double backslashes, how to store patterns in readable format

as seen on Stack Overflow - Search for 'Stack Overflow'
Would be great to have convenient way of storing patterns with single backslash. Some workarounds: store it in the file and use NIO to read. Cons: JEE does not allow IO access. Store somehow in JNDI. Maybe new to java 5 Pattern.LITERAL flag can help? I want to work with normal pattern string, like… >>> More
A WaspKiller Game with Silverlight 3, .NET RIA Services, MVP and MVVM Patterns Part 2

as seen on Dot net Slackers - Search for 'Dot net Slackers'
In this second part, we will continue to discuss the WaspKiller game development—to extend the functionalities with the server-side database support. As mentioned in the first part, we’re going to rest upon another well-known design pattern, MVVM, to achieve the extension, and at the same time, introduce… >>> More
Delving into design patterns, and what that means for the Oracle user experience

as seen on Oracle Blogs - Search for 'Oracle Blogs'
By Kathy Miedema, Oracle Applications User Experience George Hackman, Senior Director, Applications User Experiences The Oracle Applications User Experience team has some exciting things happening around Fusion Applications design patterns. Because we’re hoping to have some new offerings… >>> More
Using design-patterns to transform web-service model classes into local model classes and vise versa

as seen on Programmers - Search for 'Programmers'
There is a web-application built with play framework 1.2.7. It contains less than 10 model classes. The main purpose of the application is a lightweight access to a complex remote application (more than 50 model classes). The remote application has its own SOAP API and we use it for synchronization… >>> More