How much information can you mine out of a name?
- by Finglas Fjorn
While not directly related to programming, I figured that the programmers on here would be just as curious as I was about this question. Feel free to close the question if it does not meet with the guidelines.
A name: first, possibly a middle, and surname.
I'm curious about how much information you can mine out of a name, using publicly available datasets. I know that you can get the following with anywhere between a low-high probability (depending on the input) using US census data: 1) Gender. 2) Race.
Facebook for instance, used exactly that to find out, with a decent level of accuracy, the racial distribution of users of their site (https://www.facebook.com/note.php?note_id=205925658858).
What else can be mined? I'm not looking for anything specific, this is a very open-ended question to assuage my curiousity.
My examples are US specific, so we'll assume that the name is the name of someone located in the US; but, if someone knows of publicly available datasets for other countries, I'm more than open to them too.
I hope this is an interesting question!