Abstract
Yeanpypa is a framework written in Python and heavily inspired by tools like pyparsing and boost::spirit. It can be used to contruct recursive-descent parsers directly in python in a nearly "natural" way, meaning that the necessary Python source code looks much like the original EBNF that defined the grammar.
Table of Contents
This section shows how to get and install yeanpypa, how to use it in own programs and gives some simple (and less simple) usage examples.
Yeanpypa can be fetched directly from the Subversion
repository at https://vcs.slash-me.net/snippets/yeanpypa/. Get the
file yeanpypa.py and copy it into your
Python distribution (normally in some place like
/usr/lib/python2.4/ under Linux, see your
distribution's documentation for further details). The API
documentation (which is far more complete than this document)
can be found under http://www.slash-me.net/dev/snippets/yeanpypa/.
First of all: construct an EBNF grammar for the language you'd like to parse. A short example for parsing floating point numbers in C:
Example 1. An example of an EBNF for parsing floating point numbers
digit ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "9" | "0" ;
number ::= digit+ ;
float ::= [ number ] "." number ;
This example defines a digits as one of the set from 0 to 9, a number as at least one digit and a floating point number as an optional number, followed by a dot, followed by a number.
This definition isn't totally correct as it would allow constructions like "0002.00". For simplicity's sake, we'll go with it.
In order to construct a yeanpypa-grammar from the EBNF, we write:
Example 2. yeanpypa representation of the floating point parser
from yeanpypa import *
digit = Literal('1') | Literal('2') | Literal('3') | Literal('4') | \
Literal('5') | Literal('6') | Literal('7') | Literal('8') | \
Literal('9') | Literal('0')
number = Word(digit)
float_num = Optional(number) + Literal('.') + number
In order to save typing, yeanpypa already provides
a set of abstractions. The whole digit
thing for example could be left out, as it is already provided
by yeanpypa.
The resulting float_num object can be used
to parse a floating point number like this:
Example 3. Using a parser object
result = parse(float_num, '123.123')
if result.full():
print result.getTokens()
else:
print 'The parser did not consume all input.'
This will print the following:
['123', '.', '123']
The parser validated the input and created a list of token according to the grammar specification.
In order to use the token, we can ignore the dot, as it does not tell us anything apart from the fact that we saw a floating point number (which we know because of the validation anyway). That's where hide() comes into play. The hide() method of a rule (the basic building block of a grammar) tells the parser to ignore any token created by the rule. We change the grammar like this:
Example 4. Floating point parser ignoring the dot
number = Word(digit)
float_num = Optional(number) + Literal('.').hide() + number
We removed the digit
declaration and rather use the abstraction provided by
yeanpypa.
Note the hide()-call at the Literal(...)-rule. This instructs the parser to ignore the token created by that rule (i.e. the '.') and not create any output.
Using this parser yields the following output:
['123', '123']
We have successfully eliminated the superfluous dot token from the output.
As we're parsing numbers, we would like to see the token as actual numbers instead of strings representing numbers. Yeanpypa provides the tools to transform the strings while matching using a semantic action:
Example 5. Floating point parser using semantic actions
number = Word(digit).setAction(lambda x: int(x[0]))
float_num = Optional(number) + Literal('.').hide() + number
We have attached a semantic action to the
number-rule, which transforms its input
from a string into an integer. The action is called when the
rule sucessfully matches and gets a list of token generated by
that rule. The function must return a list of token
representing the desired output of the parser. This may be the
original input list (in case the action merely outputs some
debug information or generates some external data structure)
or it may be a transformed token (list) as in the example
given.
Using this version of the parser yields the following output:
[123, 123]
As you can see, the result token list now contains two integers instead of string representing them.
Attaching an action to a rule where hide() was called causes the action to be executed, but the output to be thrown away. Keep that in mind if you intend to mix these two facilities.
An action called for a subrule will
NOT be notified if the containing rule
fails at a later stage. E.g. if the first
number rule in the above example matched,
but float_num failed due to a missing dot,
the action of the first number would
already have been called and would not be notified about the
failure. Keep that in mind when constructing external data
structures using semantic actions.