Identifying elements from data feeds generated by affiliate sites
- by SPI
I am working with data feeds from affiliate sites. The basic idea is to provide an interface where the user can paste a link to an XML datafeed (these are huge btw, around 60 mb) that would then be streamed, parsed into small chunks, and mined for the required data which would then be stored in the database.
The problem is that different affiliate sites have different Schemas for their XML's. It is a little hard mapping the elements in an XML to your database attributes when you don't actually know which element contains what.
My Solution: Use XPath to traverse through the first set of parent and it's descendent's, fetch the elements as well as the data and and ask the user to map this data to the attributes in the database by selecting from a set of radio buttons that represent the attributes from the database. This will be done just once for each new Feed, once the system know's what's what it will automatically upload the data from the XML to the database.
Does this sound viable? Is there a better solution? I realize this leaves an uncomfortable opening for human error..
Thanks.