yeanpypa – YEt ANother PYthon PArser framework

Markus Brückner

Abstract

Yeanpypa is a framework written in Python and heavily inspired by tools like pyparsing and boost::spirit. It can be used to contruct recursive-descent parsers directly in python in a nearly "natural" way, meaning that the necessary Python source code looks much like the original EBNF that defined the grammar.


Table of Contents

Introduction
Getting & installing yeanpypa
Using yeanpypa

Introduction

This section shows how to get and install yeanpypa, how to use it in own programs and gives some simple (and less simple) usage examples.

Getting & installing yeanpypa

Yeanpypa can be fetched directly from the Subversion repository at https://vcs.slash-me.net/snippets/yeanpypa/. Get the file yeanpypa.py and copy it into your Python distribution (normally in some place like /usr/lib/python2.4/ under Linux, see your distribution's documentation for further details). The API documentation (which is far more complete than this document) can be found under http://www.slash-me.net/dev/snippets/yeanpypa/.

Using yeanpypa

First of all: construct an EBNF grammar for the language you'd like to parse. A short example for parsing floating point numbers in C:

Example 1. An example of an EBNF for parsing floating point numbers

digit  ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "9" | "0" ;
number ::= digit+ ;
float  ::= [ number ] "." number ;
          

This example defines a digits as one of the set from 0 to 9, a number as at least one digit and a floating point number as an optional number, followed by a dot, followed by a number.

Note

This definition isn't totally correct as it would allow constructions like "0002.00". For simplicity's sake, we'll go with it.

In order to construct a yeanpypa-grammar from the EBNF, we write:

Example 2. yeanpypa representation of the floating point parser

          from yeanpypa import *

digit     = Literal('1') | Literal('2') | Literal('3') | Literal('4') | \
            Literal('5') | Literal('6') | Literal('7') | Literal('8') | \
            Literal('9') | Literal('0')
number    = Word(digit)
float_num = Optional(number) + Literal('.') + number
          

Note

In order to save typing, yeanpypa already provides a set of abstractions. The whole digit thing for example could be left out, as it is already provided by yeanpypa.

The resulting float_num object can be used to parse a floating point number like this:

Example 3. Using a parser object

result = parse(float_num, '123.123')
if result.full():
    print result.getTokens()
else:
    print 'The parser did not consume all input.'
          

This will print the following:

['123', '.', '123']

The parser validated the input and created a list of token according to the grammar specification.

In order to use the token, we can ignore the dot, as it does not tell us anything apart from the fact that we saw a floating point number (which we know because of the validation anyway). That's where hide() comes into play. The hide() method of a rule (the basic building block of a grammar) tells the parser to ignore any token created by the rule. We change the grammar like this:

Example 4. Floating point parser ignoring the dot

number    = Word(digit)
float_num = Optional(number) + Literal('.').hide() + number
          

Note

We removed the digit declaration and rather use the abstraction provided by yeanpypa.

Note the hide()-call at the Literal(...)-rule. This instructs the parser to ignore the token created by that rule (i.e. the '.') and not create any output.

Using this parser yields the following output:

['123', '123']

We have successfully eliminated the superfluous dot token from the output.

As we're parsing numbers, we would like to see the token as actual numbers instead of strings representing numbers. Yeanpypa provides the tools to transform the strings while matching using a semantic action:

Example 5. Floating point parser using semantic actions

number    = Word(digit).setAction(lambda x: int(x[0]))
float_num = Optional(number) + Literal('.').hide() + number
          

We have attached a semantic action to the number-rule, which transforms its input from a string into an integer. The action is called when the rule sucessfully matches and gets a list of token generated by that rule. The function must return a list of token representing the desired output of the parser. This may be the original input list (in case the action merely outputs some debug information or generates some external data structure) or it may be a transformed token (list) as in the example given.

Using this version of the parser yields the following output:

[123, 123]

As you can see, the result token list now contains two integers instead of string representing them.

Note

Attaching an action to a rule where hide() was called causes the action to be executed, but the output to be thrown away. Keep that in mind if you intend to mix these two facilities.

Note

An action called for a subrule will NOT be notified if the containing rule fails at a later stage. E.g. if the first number rule in the above example matched, but float_num failed due to a missing dot, the action of the first number would already have been called and would not be notified about the failure. Keep that in mind when constructing external data structures using semantic actions.