Writing a post search algorithm.

Posted by MdaG on Stack Overflow See other posts from Stack Overflow or by MdaG
Published on 2010-05-27T08:27:15Z Indexed on 2010/05/27 8:31 UTC
Read the original article Hit count: 270

I'm trying to write a free text search algorithm for finding specific posts on a wall (similar kind of wall as Facebook uses). A user is suppose to be able to write some words in a search field and get hits on posts that contain the words; with the best match on top and then other posts in decreasing order according to match score.

I'm using the edit distance (Levenshtein) "e(x, y) = e" to calculate the score for each post when compared to the query word "x" and post word "y" according to: score(x, y) = 2^(2 - e)(1 - min(e, |x|) / |x|)

Each word in a post contributes to the total score for that specific post. This approach seems to work well when the posts are of roughly the same size, but sometime certain large posts manages to rack up score solely on having a lot of words in them while in practice not being relevant to the query.

Am I approaching this problem in the wrong way or is there some way to normalize the score that I haven't thought of?

© Stack Overflow or respective owner

Related posts about python

Related posts about post