Grab two parts of a single, short string
- by TankorSmash
I'm looking to fill a python dict with TAG:definition pairs, and I'm using RegExr http://gskinner.com/RegExr/ to write the regex
My first step is to parse a line, from http://www.id3.org/id3v2.3.0, or http://pastebin.com/VJEBGauL and pull out the ID3 tag and the associated definition. For example the first line:
4.20 AENC [#sec4.20 Audio encryption]
would look like this myDict = {'AENC' : 'Audio encryption'}
To grab the tag name, I've got it looking for at least 3 spaces, then 4 characters, then 4 spaces: {3}[a-zA-Z0-9]{4} {4} That part is easy enough.
The second part, the definition, is not working out for me. So far, I've got (?<=(\[#.+?)) A Which should find, but not include the [# as well as an indeterminded set of characters until it finds: _A, but it's failing. If I remove .+? and replace _A with s it works out alright. What is going wrong? *The underscores represent spaces, which don't show up on SO.
How do I grab the definition, ie,(Audio encryption) of the ID3v2 tag from the line, using RegEx?