Map large integer to a phrase
- by Alexander Gladysh
I have a large and "unique" integer (actually a SHA1 hash).
I want (for no other reason than to have fun) to find an algorithm to convert that SHA1 hash to a (pseudo-)English phrase. The conversion should be reversible (i.e., knowing the algorithm, one must be able to convert the phrase back to SHA1 hash.)
The possible usage of the generated phrase: the human readable version of Git commit ID, like a motto for a given program version (which is built from that commit). (As I said, this is "for fun". I don't claim that this is very practical — or be much more readable than the SHA1 itself.)
A better algorithm would produce shorter, more natural-looking, more unique phrases.
The phrase need not make sense. I would even settle for a whole paragraph of nonsense. (Though quality — englishness — of a paragraph should probably be better than for a mere phrase.)
A variation: it is OK if I will be able to work only with a part of hash. Say, first six digits is OK.
Possible approach: In the past I've attempted to build a probability table (of words), and generate phrases as Markov chains, seeding the generator (picking branches from probability tree), according to the bits I read from the SHA. This was not very successful, the resulting phrases were too long and ugly. I'm not sure if this was a bug, or the general flaw in the algorithm, since I had to abandon it early enough.
Now I'm thinking about attempting to solve the problem once again. Any advice on how to approach this? Do you think Markov chain approach can work here? Something else?