Algorithm to classify a list of products?
- by Martin
I have a list representing products which are more or less the same. For instance, in the list below, they are all Seagate hard drives.
Seagate Hard Drive 500Go
Seagate Hard Drive 120Go for laptop
Seagate Barracuda 7200.12 ST3500418AS 500GB 7200 RPM SATA 3.0Gb/s Hard Drive
New and shinny 500Go hard drive from Seagate
Seagate Barracuda 7200.12
Seagate FreeAgent Desk 500GB External Hard Drive Silver 7200RPM USB2.0 Retail
For a human being, the hard drives 3 and 5 are the same. We could go a little bit further and suppose that the products 1, 3, 4 and 5 are the same and put in other categories the product 2 and 6.
We have a huge list of products that I would like to classify. Does anybody have an idea of what would be the best algorithm to do such thing. Any suggestions?
I though of a Bayesian classifier but I am not sure if it is the best choice. Any help would be appreciated!
Thanks.