Cleaning a dataset of song data - what sort of problem is this?
Posted
by
Rob Lourens
on Programmers
See other posts from Programmers
or by Rob Lourens
Published on 2013-11-05T06:28:04Z
Indexed on
2013/11/05
10:11 UTC
Read the original article
Hit count: 289
data
|data-mining
I have a set of data about songs. Each entry is a line of text which includes the artist name, song title, and some extra text. Some entries are only "extra text". My goal is to resolve as many of these as possible to songs on Spotify using their web API.
My strategy so far has been to search for the entry via the API - if there are no results, apply a transformation such as "remove all text between ( )" and search again. I have a list of heuristics and I've had reasonable success with this but as the code gets more and more convoluted I keep thinking there must be a more generic and consistent way. I don't know where to look - any suggestions for what to try, topics to study, buzzwords to google?
© Programmers or respective owner