Grab two parts of a single, short string
Posted
by
TankorSmash
on Stack Overflow
See other posts from Stack Overflow
or by TankorSmash
Published on 2012-07-03T21:01:13Z
Indexed on
2012/07/03
21:15 UTC
Read the original article
Hit count: 170
I'm looking to fill a python dict
with TAG
:definition
pairs, and I'm using RegExr http://gskinner.com/RegExr/ to write the regex
My first step is to parse a line, from http://www.id3.org/id3v2.3.0, or http://pastebin.com/VJEBGauL and pull out the ID3 tag and the associated definition. For example the first line:
4.20 AENC [#sec4.20 Audio encryption]
would look like this myDict = {'AENC' : 'Audio encryption'}
To grab the tag name, I've got it looking for at least 3 spaces, then 4 characters, then 4 spaces: {3}[a-zA-Z0-9]{4} {4}
That part is easy enough.
The second part, the definition, is not working out for me. So far, I've got (?<=(\[#.+?)) A
Which should find, but not include the [#
as well as an indeterminded set of characters until it finds: _A
, but it's failing. If I remove .+?
and replace _A
with s
it works out alright. What is going wrong? *The underscores represent spaces, which don't show up on SO.
How do I grab the definition, ie,(Audio encryption)
of the ID3v2 tag from the line, using RegEx?
© Stack Overflow or respective owner