Normalising book titles - Python

Posted by RadiantHex on Stack Overflow See other posts from Stack Overflow or by RadiantHex
Published on 2010-03-16T22:51:09Z Indexed on 2010/03/16 23:01 UTC
Read the original article Hit count: 199

Filed under:
|
|
|
|

Hi folks,

I have a list of books titles:

  • "The Hobbit: 70th Anniversary Edition"
  • "The Hobbit"
  • "The Hobbit (Illustrated/Collector Edition)[There and Back Again]"
  • "The Hobbit: or, There and Back Again"
  • "The Hobbit: Gift Pack"

and so on...


I thought that if I normalised the titles somehow, it would be easier to implement an automated way to know what book each edition is referring to.

normalised = ''.join([char for char in title 
                       if char in (string.ascii_letters + string.digits)])

or

normalised = ''
for char in title:
  if char in ':/()|':
    break
  normalised += char
return normalised

But obviously they are not working as intended, as titles can contain special characters and editions can basically have very different title layouts.


Help would be very much appreciated! Thanks :)

© Stack Overflow or respective owner

Related posts about python

Related posts about django