Python Code Generator for ANTLR 2.7.7

As of ANTLR 2.7.5, you can generate your Lexers, Parsers and TreeParsers in Python. This feature extends the benefits of ANTLR's predicated-LL(k) parsing technology to the Python language and platform.

To be able to build and use the Python language Lexers, Parsers and TreeParsers, you will need to have the ANTLR Python runtime library installed in your Python path. The Python runtime model is based on the existing runtime model for Java and is thus immediately familiar. The Python runtime and the Java runtime are very similar although there a number of subtle (and not so subtle) differences. Some of these result from differences in the respective runtime environments.

ANTLR Python support was contributed (and is to be maintained) by Wolfgang Haefelinger and Marq Kole.

Building the ANTLR Python Runtime

The ANTLR Python runtime source and build files are completely integrated in the ANTLR build process.The ANTLR runtime support module for Python is located in the lib/python subdirectory of the ANTLR distribution. Installation of the Python runtime support is enabled automatically if Python can be found on your system by the configure script.

With Python support enabled the current distribution will look for the presence of a python executable of version 2.2 or higher. If it has found such a beast, it will generate and install the ANTLR Python runtime as part of the overall ANTLR building and installation process.

If the python distribution you are using is at an unusual location, perhaps because you are using a local installation instead of a system-wide one, you can provide the location of that python executable using the --with-python=<path> option for the configure script, for instance:

./configure --with-python=$HOME/bin/python2.3

Also, if the python executable is at a regular location, but has a name that differs from "python", you can specify the correct name through the --with-python=<path>, as shown above, or through environment variable $PYTHON

PYTHON=python2.3
export PYTHON
./configure

All the example grammars for the ANTLR Python runtime are built when ANTLR itself is built. They can be run in one go by running make test in the same directory where you ran the configure script in the ANTLR distribution. So after you've run configure you can do:

# Build ANTLR and all examples
make
# Run them
make test
# Install everything
make install

Note that make install will not add the ANTLR Python runtime (i.e. antlr.py) to your Python installation but rather install antlr.py in ${prefix}/lib. To be able to use antlr.py you would need to adjust Python's sys.path.

However, there a script is provided that let's you easily add antlr.py as module to your Python installation. After installation just run

${prefix}/sbin/pyantlr.sh install

Note that usually you need to be superuser in order to succeed. Also note that you can run this command later at any time again, for example, if you have a second Python installation etc. Just make sure that python is in your $PATH when running pyantlr.sh.

Note further that you can also do this to install ANTLR Python runtime immediatly after having called ./configure:

scripts/pyantlr.sh install

Specifying Code Generation

You can instruct ANTLR to generate your Lexers, Parsers and TreeParsers using the Python code generator by adding the following entry to the global options section at the beginning of your grammar file.

{
    language="Python";
}

After that things are pretty much the same as in the default java code generation mode. See the examples in examples/python for some illustrations.

One particular issue that is worth mentioning is the handling of comments in ANTLR Python. Java, C++, and C# all use the same lexical structures to define comments: // for single-line comments, and /* ... */ for block comments. Unfortunately, Python does not handle comments this way. It only knows about single-line comments, and these start off with a # symbol.

Normally, all comments outside of actions are actually comments in the ANTLR input language. These comments, and that is both block comments and single-line comments are translated into Python single-line comments.

Secondly, all comments inside actions should be comments in the target language, Python in this case. Unfortunately, if the actions contain ANTLR actions, such as $getText, the code generator seems to choke on Python comments as the # sign is also used in tree construction. The solution is to use Java/C++-style comments in all actions; these will be translated into Python comments by the ANTLR as it checks these actions for the presence of predefined action symbols such as $getText.

So, as a general issue: all comments in an ANTLR grammar for the Python target should be in Java/C++ style, not in Python style.

Python-Specific ANTLR Sections

Python-Specific ANTLR Options

A Template Python ANTLR Grammar File

As the handling of modules &emdash; packages in Java speak &emdash; in Python differs from that in Java, the current approach in ANTLR to call both the file and the class they contain after the name of the