I run a product recommendation engine and I'm hitting a few snags. I'm looking to see if anyone has any recommendations on what I should do to minimize these issues.
Here's how the site works:
Users come to the site and are presented with product recommendations based on some criteria. If a user knows of a product that is not in our system, they can add it by providing the product name and manufacturer. We take that information, and:
Hit one API to gather all the product meta-data (and to validate the product spelling, etc). If the product is not in this first API, we do not allow it in our system.
Use the information from step 1 to hit another API for pricing information (gathered from many places online).
For the sake of discussion, assume that I am searching both APIs in the most efficient/successful manner possible.
For the most part, this works very well. I'd say ~80% of our data is perfectly accurate, but there are a few issues:
Sometimes the pricing API (Step 2) doesn't have any information for the product. The way the pricing API is built, it will always return something (theoretically, the closest possible match), and there's no guarantee that the product name is spelled exactly the same way in both APIs, so there's no automated way of knowing if it's the right product.
When the pricing API finds the right product, occasionally it has outdated, or even invalid pricing data (e.g. if it screen-scraped the wrong price from a website).
Since the site was fairly small at first, I was able to manually verify every product that was added to the website. However, the site has grown to the point where this is taking several hours per day, and is just not efficient use of my time.
So, my question is:
Aside from hiring someone (or getting an intern) to validate all the data manually, what would be the best system of letting my userbase self-manage the data. Specifically, how can I allow users to edit the data while minimizing the risk of someone ambushing my website, or accidentally setting the data incorrectly.