PyParsing: Not all tokens passed to setParseAction()

Posted by Rosarch on Stack Overflow See other posts from Stack Overflow or by Rosarch
Published on 2010-05-30T22:03:38Z Indexed on 2010/05/30 22:12 UTC
Read the original article Hit count: 282

Filed under:
|
|
|

I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like:

[[("CS" 2110)], [("INFO", 3300)]]

To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed:

>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7 
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

I expected all the tokens to be passed, but it's only ['INFO', 3300]. Am I doing something wrong? Or is there another way that I can produce the desired output?

Here is the pyparsing code:

from pyparsing import *

def statementParse(str, location, tokens):
    print "string %s" % str
    print "loc: %s " % location
    print "tokens: %s" % tokens

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

OR_CONJ = Suppress("or")

COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))

course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")

statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

© Stack Overflow or respective owner

Related posts about python

Related posts about parsing