Algorithm for sentence analysis and tokenization
Posted
by Andrea Nagar
on Stack Overflow
See other posts from Stack Overflow
or by Andrea Nagar
Published on 2010-05-28T00:27:01Z
Indexed on
2010/05/28
0:31 UTC
Read the original article
Hit count: 267
c#
|natural-language
I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?
© Stack Overflow or respective owner