Tag Cloud Data Backend
Posted
by Waldron
on Stack Overflow
See other posts from Stack Overflow
or by Waldron
Published on 2010-04-08T21:10:06Z
Indexed on
2010/04/08
21:13 UTC
Read the original article
Hit count: 238
tag-cloud
I want to be able to generate tag clouds from free text that comes from any number of different sources. For clarity, I'm not talking about how to display a tag cloud once the critical tags/phrases are already discovered, I'm hoping to be able to discover the meaningful phrases themselves... preferable on a PHP/MySQL stack.
If I had to do this myself, I'd start by establishing some kind of index for words/phrases that gives a "normal" frequency for any word/phrase. eg "Constantinople" occurs once in every 1,000,000 words on average (normal frequency "0.000001"). Then as I analyze a body of text, I'd find the individual words/phrases (another challenge!), find frequencies of each within the input, and measure against the expected freqeuncy. Words that have the highest ratio against expected frequency get boosted priority in the cloud.
I'd like to believe someone else has already done this, WAY better than I could hope to, but I'll be damned if I can find it.
Any recommendations??
© Stack Overflow or respective owner