How to develop an english .com domain value rating algorithm?

Posted by Tom on Stack Overflow See other posts from Stack Overflow or by Tom
Published on 2011-01-03T21:43:07Z Indexed on 2011/01/03 21:53 UTC
Read the original article Hit count: 229

Filed under:
|
|
|
|

I've been thinking about an algorithm that should rougly be able to guess the value of an english .com domain in most cases.

For this to work I want to perform tests that consider the strengths and weaknesses of an english .com domain.

A simple point based system is what I had in mind, where each domain property can be given a certain weight to factor it's importance in.

I had these properties in mind:

domain character length

Eg. initially 20 points are added. If the domain has 4 or less characters, no points are substracted. For each extra character, one or more points are substracted on an exponential basis (the more characters, the higher the penalty).

domain characters

Eg. initially 20 points are added. If the domain is only alphabetic, no points are substracted. For each non-alhabetic character, X points are substracted (exponential increase again).

domain name words

Scans through a big offline english database, including non-formal speech, eg. words like "tweet" should be recognized.

Question 1 : where can I get a modern list of english words for use in such application? Are these lists available for free? Are there lists like these with non-formal words?

The more words are found per character, the more points are added. So, a domain with a lot of characters will still not get a lot of points.

words hype-level

I believe this is a tricky one, but this should be the cause to differentiate perfect but boring domains from perfect and interesting domains.

For example, the following domain is probably not that valueable: www.peanutgalaxy.com

The algorithm should identify that peanuts and galaxies are not very popular topics on the web. This is just an example.

On the other side, a domain like www.shopdeals.com should ring a bell to the hype test, as shops and deals are quite popular on the web.

My initial thought would be to see how often these keywords are references to on the web, preferably with some database.

Question 2: is this logic flawed, or does this hype level test have merit?

Question 3: are such "hype databases" available? Or is there anything else that could work offline? The problem with eg. a query to google is that it requires a lot of requests due to the many domains to be tested.

domain name spelling mistakes

Domains like "freemoneyz.com" etc. are generally (notice I am making a lot of assumptions in this post but that's necessary I believe) not valueable due to the spelling mistakes.

Question 4: are there any offline APIs available to check for spelling mistakes, preferably in javascript or some database that I can use interact with myself. Or should a word list help here as well?

use of consonants, vowels etc.

A domain that is easy to pronounce (eg. Google) is usually much more valueable than one that is not (eg. Gkyld).

Question 5: how does one test for such pronuncability? Do you check for consonants, vowels, etc.? What does a valueable domain have? Has there been any work in this field, where should I look?

That is what I came up with, which leads me to my final two questions.

Question 6: can you think of any more english .com domain strengths or weaknesses? Which? How would you implement these?

Question 7: do you believe this idea has any merit or all, or am I too naive? Anything I should know, read or hear about? Suggestions/comments?

Thanks!

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about domain