How to detect if 2 news articles have the same topic? (Python language-comparison)
- by resopollution
I'm looking for ideas on recommended approach.
I'm trying to scrape some headlines and body text from articles for a few specific sites, similar to what Google does with Google News.
The problem is across different sites, they may have articles on the same exact subject, worded slightly differently.
Can anyone point to me what I need to know in order to write a comparison algorithm to auto-detect similar articles?
Thanks very much in advance.
I use Python.