Grab two parts of a single, short string

Posted by TankorSmash on Stack Overflow See other posts from Stack Overflow or by TankorSmash
Published on 2012-07-03T21:01:13Z Indexed on 2012/07/03 21:15 UTC
Read the original article Hit count: 170

Filed under:
|
|

I'm looking to fill a python dict with TAG:definition pairs, and I'm using RegExr http://gskinner.com/RegExr/ to write the regex

My first step is to parse a line, from http://www.id3.org/id3v2.3.0, or http://pastebin.com/VJEBGauL and pull out the ID3 tag and the associated definition. For example the first line:

4.20    AENC    [#sec4.20 Audio encryption]

would look like this myDict = {'AENC' : 'Audio encryption'}

To grab the tag name, I've got it looking for at least 3 spaces, then 4 characters, then 4 spaces: {3}[a-zA-Z0-9]{4} {4} That part is easy enough.

The second part, the definition, is not working out for me. So far, I've got (?<=(\[#.+?)) A Which should find, but not include the [# as well as an indeterminded set of characters until it finds: _A, but it's failing. If I remove .+? and replace _A with s it works out alright. What is going wrong? *The underscores represent spaces, which don't show up on SO.

How do I grab the definition, ie,(Audio encryption) of the ID3v2 tag from the line, using RegEx?

© Stack Overflow or respective owner

Related posts about python

Related posts about regex