Data on the Frequency of Edit Operations Required to Correct a Misspelt Word

Posted by gvkv on Stack Overflow See other posts from Stack Overflow or by gvkv
Published on 2010-05-17T21:17:22Z Indexed on 2010/05/17 21:20 UTC
Read the original article Hit count: 236

Filed under:
|
|

Does anybody know of any data that relates to the frequency of the types of mistakes the people make when they misspell a word? I'm not referring to words themselves, but tje errors that are made by the typist. For example, I personally make transposition errors the most followed by deletion errors (that is, not including a letter I should), substitution errors and lastly, insertion errors. However, it would not surprise me to find out that typing a wrong letter (a substitution error, e.g., xat instead of cat) is more frequent than not including a letter.

My purpose is to be able to make best guesses at correcting a word when I only have the original user's input. The idea being that if one type of error is more frequent than others, then it's more likely that correcting a word via that type of operation is correct. I don't object to using a database of commonly misspelt words but I prefer an algorithmic solution to depending on a corpus--especially if it might be faster.

© Stack Overflow or respective owner

Related posts about spellchecking

Related posts about database