The OpenToken package is a facility for performing token analysis and parsing within the Ada language. It is designed to provide all the functionality of a traditional lexical analyzer/parser generator, such as lex/yacc. But due to the magic of inheritance and runtime polymorphism it is implemented entirely in Ada as withed-in code. No precompilation step is required, and no messy tool-generated source code is created.
For example, here is an OpenToken LR specification of a typical arithmetic grammar, taken from Example 5.10, section 5.3, page 295 of the red dragon book:
Grammar specification: L -> E EOF E -> E + T E -> T T -> T * F T -> F F -> ( E ) F -> integer Ada code: Grammar : constant Production_List.Instance := L <= E & EOF + Print_Value'Access and E <= E & Plus & T + Add_Integers and E <= T and T <= T & Times & F + Multiply_Integers and T <= F and F <= Left_Paren & E & Right_Paren + Synthesize_Second and F <= Int_Literal;
This grammar is processed at runtime into a state machine data structure, then used to parse input text. It uses dynamic memory allocation at runtime to hold a stack of the tokens and productions.
And here is a specification of an equivalent grammar, rewritten to allow a recursive-descent implementation:
Grammar specification:
L -> E EOF
E -> T {+ T}
T -> F {* F}
F -> ( E )
F -> integer
Ada code:
E : Operation_List.Handle := new Operation_List.Instance;
F : Integer_Selection.Handle :=
(Left_Paren & E & Right_Paren + Build_Parens'Access or Int) + Build_Selection'Access;
T : Operation_List.Handle := F ** Times * Times_Element'Access + Init_Times'Access;
L : Integer_Sequence.Handle := E & EOF + Build_Print'Access;
E.all := Operation_List.Get
(Element => T,
Separator => Plus,
Initialize => Init_Plus'Access,
Add_Element => Plus_Element'Access);
This grammar is used directly at runtime to parse input text. It does not use dynamic memory allocation during parsing (it is used while building the grammar), but it does use more stack space than the LR version. Both parsers are reasonably efficient.
OpenToken also allows implementing horribly inefficient recursive descent grammars that still work; it supports infinite lookahead with backtracking. This can be useful when just getting started with parsers; first get it working, then optimize it.
The lexer phase of OpenToken parsers is implemented by classes of recognizers. Each recognizer runs in parallel, and the one matching the longest input lexeme wins. This is not as efficient as a tradational lexers, for example those generated by lex, since those use a finite state machine. But it is easier to use, because the recognizers are already written, well documented, and can be independently tested.
Some reusable parse tokens are also included in OpenToken, but they tend to be more domain-specific.
Ada's type safety features make misbehaving lexers and parsers easier to debug. In addition, OpenToken includes a trace feature, that outputs useful information as the parse progresses, also aiding debugging.
OpenToken is distributed under GPL version 3, with the GNAT modification that allows use in non-GPL projects.
OpenToken was originally written by Ted Dennison. Contributions were made by Christoph Grein and Stephen Leake.
OpenToken is currently maintained by Stephen Leake. Submit bugs via the Debian Bug system, against the opentoken source package. Discussion about OpenToken is on the newsgroup comp.lang.ada.
OpenToken may be obtained in several ways:
OpenToken has been tested on the following operating systems/compilers:
The simplest way to use the source distribution is to install it in the standard GNAT directories, where the compiler will find it by default. Then just put 'with "OpenToken";' in your project file. To install it on Windows, using a DOS shell, and assuming GNAT and 'make' are in your PATH:
cd opentoken/Build/windows_release make -f Makefile.install install
On GNU/Linux, use 'linux_release' instead of 'windows_release'. This assumes you have write permission in the GNAT directory tree.
See developer instructions if you want to run the tests or work on OpenToken.
Ted's original OpenToken web page is no longer being maintained. The AdaPower mailing list and discussion board are no longer used.
There is a User's Guide that describes how to create a simple application using OpenToken. There are also several examples in the source; see the Examples directory. The Test directory also has useful examples, although oriented towards testing OpenToken itself.
OpenToken uses this test page to verify that the HTML lexer handles valid HTML 4.01.
tag; the contents are treated as a comment.
This version contains another code reorganization to go with another new parsing facility. This time it is recursive decent parsing. The new method has the following advantages over table-driven parsers:
The disadvantages are:
Given the above balance, I do intend to make this the standard supported parsing facility for future versions of OpenToken. The "b" designation is there to indicate that some things might not be in quite their permanent form yet, and that there isn't yet the full set of reusable tokens to support it that I would like to see in a release. I'm hoping for feedback both in the form of criticisms/suggestions, and reusable tokens in order to help finalize this facility.
A general list of the changes is below:
This is the first version to include parsing capability. The existing packages underwent a major reorganization to accommodate the new functionality. As some of the restructuring that was done is incompatible with old code, the major revision has been bumped up to 2. A partial list of changes is below:
This version fixes a rare bug in the Ada style based numeric recognizers. The SLOC counter can now successhe Tok OpenToken
The OpenToken package is a facility for performing token analysis and parsing within the Ada language. It is designed to provide all the functionality of a traditional lexical analyzer/parser generator, such as lex/yacc. But due to the magic of inheritance and runtime polymorphism it is implemented entirely in Ada as withed-in code. No precompilation step is required, and no messy tool-generated source code is created.
For example, here is an OpenToken LR specification of a typical arithmetic grammar, taken from Example 5.10, section 5.3, page 295 of the red dragon book:
Grammar specification: L -> E EOF E -> E + T E -> T T -> T * F T -> F F -> ( E ) F -> integer Ada code: Grammar : constant Production_List.Instance := L <= E & EOF + Print_Value'Access and E <= E & Plus & T + Add_Integers and E <= T and T <= T & Times & F + Multiply_Integers and T <= F and F <= Left_Paren & E & Right_Paren + Synthesize_Second and F <= Int_Literal;
This grammar is processed at runtime into a state machine data structure, then used to parse input text. It uses dynamic memory allocation at runtime to hold a stack of the tokens and productions.
And here is a specification of an equivalent grammar, rewritten to allow a recursive-descent implementation:
Grammar specification:
L -> E EOF
E -> T {+ T}
T -> F {* F}
F -> ( E )
F -> integer
Ada code:
E : Operation_List.Handle := new Operation_List.Instance;
F : Integer_Selection.Handle :=
(Left_Paren & E & Right_Paren + Build_Parens'Access or Int) + Build_Selection'Access;
T : Operation_List.Handle := F ** Times * Times_Element'Access + Init_Times'Access;
L : Integer_Sequence.Handle := E & EOF + Build_Print'Access;
E.all := Operation_List.Get
(Element => T,
Separator => Plus,
Initialize => Init_Plus'Access,
Add_Element => Plus_Element'Access);
This grammar is used directly at runtime to parse input text. It does not use dynamic memory allocation during parsing (it is used while building the grammar), but it does use more stack space than the LR version. Both parsers are reasonably efficient.
OpenToken also allows implementing horribly inefficient recursive descent grammars that still work; it supports infinite lookahead with backtracking. This can be useful when just getting started with parsers; first get it working, then optimize it.
The lexer phase of OpenToken parsers is implemented by classes of recognizers. Each recognizer runs in parallel, and the one matching the longest input lexeme wins. This is not as efficient as a tradational lexers, for example those generated by lex, since those use a finite state machine. But it is easier to use, because the recognizers are already written, well documented, and can be independently tested.
Some reusable parse tokens are also included in OpenToken, but they tend to be more domain-specific.
Ada's type safety features make misbehaving lexers and parsers easier to debug. In addition, OpenToken includes a trace feature, that outputs useful information as the parse progresses, also aiding debugging.
OpenToken is distributed under GPL version 3, with the GNAT modification that allows use in non-GPL projects.
OpenToken was originally written by Ted Dennison. Contributions were made by Christoph Grein and Stephen Leake.
OpenToken is currently maintained by Stephen Leake. Submit bugs via the Debian Bug system, against the opentoken source package. Discussion about OpenToken is on the newsgroup comp.lang.ada.
OpenToken may be obtained in several ways:
OpenToken has been tested on the following operating systems/compilers: