Normalising book titles - Python
Posted
by RadiantHex
on Stack Overflow
See other posts from Stack Overflow
or by RadiantHex
Published on 2010-03-16T22:51:09Z
Indexed on
2010/03/16
23:01 UTC
Read the original article
Hit count: 199
Hi folks,
I have a list of books titles:
- "The Hobbit: 70th Anniversary Edition"
- "The Hobbit"
- "The Hobbit (Illustrated/Collector Edition)[There and Back Again]"
- "The Hobbit: or, There and Back Again"
- "The Hobbit: Gift Pack"
and so on...
I thought that if I normalised the titles somehow, it would be easier to implement an automated way to know what book each edition is referring to.
normalised = ''.join([char for char in title
if char in (string.ascii_letters + string.digits)])
or
normalised = ''
for char in title:
if char in ':/()|':
break
normalised += char
return normalised
But obviously they are not working as intended, as titles can contain special characters and editions can basically have very different title layouts.
Help would be very much appreciated! Thanks :)
© Stack Overflow or respective owner