Are there any working implementations of the rolling hash function used in the Rabin-Karp string sea

Posted by c14ppy on Stack Overflow See other posts from Stack Overflow or by c14ppy
Published on 2010-02-22T21:08:14Z Indexed on 2010/04/11 6:33 UTC
Read the original article Hit count: 543

Filed under:
|
|
|
|

I'm looking to use a rolling hash function so I can take hashes of n-grams of a very large string.

For example:

"stackoverflow", broken up into 5 grams would be:

"stack", "tacko", "ackov", "ckove", "kover", "overf", "verfl", "erflo", "rflow"

This is ideal for a rolling hash function because after I calculate the first n-gram hash, the following ones are relatively cheap to calculate because I simply have to drop the first letter of the first hash and add the new last letter of the second hash.

I know that in general this hash function is generated as:

H = c1ak - 1 + c2ak - 2 + c3ak - 3 + ... + cka0 where a is a constant and c1,...,ck are the input characters.

If you follow this link on the Rabin-Karp string search algorithm , it states that "a" is usually some large prime.

I want my hashes to be stored in 32 bit integers, so how large of a prime should "a" be, such that I don't overflow my integer?

Does there exist an existing implementation of this hash function somewhere that I could already use?


Here is an implementation I created:

public class hash2
{

    public int prime = 101;

    public int hash(String text)
    {
        int hash = 0;

        for(int i = 0; i < text.length(); i++)
        {
            char c = text.charAt(i);
            hash += c * (int) (Math.pow(prime, text.length() - 1 - i));
        }

        return hash;
    }

    public int rollHash(int previousHash, String previousText, String currentText)
    {

        char firstChar = previousText.charAt(0);
        char lastChar = currentText.charAt(currentText.length() - 1);

        int firstCharHash = firstChar * (int) (Math.pow(prime, previousText.length() - 1));
        int hash = (previousHash - firstCharHash) * prime + lastChar;

        return hash;
    }

    public static void main(String[] args)
    {
        hash2 hashify = new hash2();

        int firstHash = hashify.hash("mydog");
        System.out.println(firstHash);
        System.out.println(hashify.hash("ydogr"));
        System.out.println(hashify.rollHash(firstHash, "mydog", "ydogr"));
    }

}

I'm using 101 as my prime. Does it matter if my hashes will overflow? I think this is desirable but I'm not sure.

Does this seem like the right way to go about this?

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about hash