Data on the Frequency of Edit Operations Required to Correct a Misspelt Word
- by gvkv
Does anybody know of any data that relates to the frequency of the types of mistakes the people make when they misspell a word? I'm not referring to words themselves, but tje errors that are made by the typist. For example, I personally make transposition errors the most followed by deletion errors (that is, not including a letter I should), substitution errors and lastly, insertion errors. However, it would not surprise me to find out that typing a wrong letter (a substitution error, e.g., xat instead of cat) is more frequent than not including a letter.
My purpose is to be able to make best guesses at correcting a word when I only have the original user's input. The idea being that if one type of error is more frequent than others, then it's more likely that correcting a word via that type of operation is correct. I don't object to using a database of commonly misspelt words but I prefer an algorithmic solution to depending on a corpus--especially if it might be faster.