Bisonc++ derives from bison++(1), originally derived from bison(1). Like these programs bisonc++ generates a parser for an LALR(1) grammar. Bisonc++ generates C++ code: an expandable C++ class.
Refer to bisonc++(1) for a general overview. This manual page covers the structure and organization of bisonc++'s grammar file(s).
Bisonc++'s grammar file has the following generic outline:
directives (see the next section)
%%
grammar rules
Grammar rules have the following generic form:
nonterminal:
production-rules
;
Production rules consist of zero or more sequences of terminal tokens, nonterminal tokens and/or action blocks. When multiple production rules are used they must be separated from each other by vertical bars. Action blocks are C++ compound statements.
This manual page contains the following sections:
Starting with version 6.02.00 bisonc++ reserved identifiers no longer end in two underscore characters, but in one. This modification was necessary because according to the C++ standard identifiers having two or more consecutive underscore characters are reserved by the language. In practice this could require some minor modifications of existing source files using bisonc++'s facilities, most likely limited to changing Tokens__ into Tokens_ and changing Meta__ into Meta_.
The complete list of affected names is:
DebugMode_, ErrorRecovery_, Return_, Tag_, Tokens_
PARSE_ABORT_, PARSE_ACCEPT_, UNEXPECTED_TOKEN_, sizeofTag_
Meta_, PI_, STYPE_
clearin_, errorRecovery_, errorVerbose_, executeAction_, lex_, lookup_, nextCycle_, nextToken_, popToken_, pop_, print_, pushToken_, push_, recovery_, redoToken_, reduce_, savedToken_, shift_, stackSize_, startRecovery_, state_, token_, top_, vs_,
d_acceptedTokens_, d_actionCases_, d_debug_, d_nErrors_, d_requiredTokens_, d_val_, idOfTag_, s_nErrors_
Quite a few directives can be specified in the initial section of the grammar specification file. If command-line options for directives are available, then their specifications take precedence over the corresponding directives in the grammar file. Once class header or implementation header files exist directives affecting those files are ignored.
Directives accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a 'pathname' may contain directory separators. A 'pathname' using blank characters should be surrounded by double quotes.
Some directives may generate errors. This happens when their specifications conflict with the contents of files bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a namespace, but in a later run the a %namespace directive was provided).
To resolve such errors the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive's specification.
Filename defines the name of the file to contain the parser's base class. This class defines, e.g., the parser's symbolic tokens. Defaults to the name of the parser class plus the suffix base.h. This directive is overruled by the --baseclass-header (-b) command-line option.
It is an error if this directive is used and an already existing parser class header file does not contain #include "filename".
Pathname defines the path to the file preincluded by the parser's base-class header. See the description of the --baseclass-preinclude option for details about this directive. By default, bisonc++ surrounds header by double quotes. However, when header itself is surrounded by pointed brackets #include <header> is included.
Filename defines the name of the file to contain the parser class. Defaults to the name of the parser class plus the suffix .h This directive is overruled by the --class-header (-c) command-line option.
It is an error if this directive is used and an already existing implementation header file does not contain #include "filename".
Declares the name of the parser class. It defines the name of the C++ class that is generated. If no %class-name is specified the default class name Parser is used.
It is an error if this directive is used and an already existing parser-class header file does not define class `className' and/or if an already existing implementation header file does not define members of the class `className'.
Add debugging code to the generated parse and its support functions, which can show (on the standard output stream) the steps performed by the parsing function while it parses input streams. When this directive is specified then the parsing steps are shown by default. The setDebug members can be used to suppress outputting these parsing steps. #ifdef DEBUG macros are not used. Existing debugging code can be removed by rerunning bisonc++ without specifying the debug option or directive.
By default, bisonc++ adds a $$ = $1 action block to rules not having final action blocks, but not to empty production rules. This default behavior can also explicitly be configured using the default-actions std option or directive.
Bisonc++ also supports alternate ways of handling rules not having final action blocks. When off is specified, bisonc++ does not add $$ = $1 action blocks; when polymorphic semantic values are used, then specifying
- warn adds specialized action blocks, using the semantic types of the first elements of the production rules, while issuing a warning;
- quiet adds these action blocks without issuing warnings.
When either warn or quiet are specified the types of $$ and $1 must match. When bisonc++ detects a type mismatches it issues errors.
This directive can be specified to dump the parser's state stack to the standard output stream when the parser encounters a syntactic error. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack's top element.
This directive specifies the exact number of shift/reduce and reduce/reduce conflicts for which no warnings are to be generated. Details of the conflicts are reported in the verbose output file (e.g., grammar.output). If the number of actually encountered conflicts deviates from `number', then this directive is ignored.
Filename is a generic filename that is used for all header files generated by bisonc++. Options defining specific filenames are also available (which then, in turn, overrule the name specified by this directive). This directive is overruled by the --filenames (-f) command-line option.
When provided, the scanner member returning the matched text is called as d_scanner.YYText(), and the scanner member returning the next lexical token is called as d_scanner.yylex(). This directive is only interpreted if the %scanner directive is also provided.
Filename defines the name of the file to contain the implementation header. It defaults to the name of the generated parser class plus the suffix .ih.
The implementation header should contain all directives and declarations that are only used by the parser's member functions. It is the only header file that is included by the source file containing parse's implementation. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
This directive is used to switch to pathname while processing a grammar specification. Unless pathname defines an absolute file-path, pathname is searched relative to the location of bisonc++'s main grammar specification file (i.e., the grammar file that was specified as bisonc++'s command-line option). This directive can be used to split long grammar specification files in shorter, meaningful units. After processing pathname processing continues beyond the %include pathname directive.
Defines the names of symbolic terminal tokens that must be treated as left-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
Defines the organization of the location-struct data type LTYPE_. This struct should be specified analogously to the way the parser's stacktype is defined using %union (see below). The location struct is named LTYPE_. By default (if neither locationstruct nor LTYPE_ is specified) the standard location struct (see the next directive) is used:
This directive results in bisonc++ generating a parser using the standard location stack. This stack's default type is:
struct LTYPE_
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
Bisonc++ does not provide the elements of the LTYPE_ struct
with values. Action blocks of production rules may refer to the
location stack element associated with a production element using
@ variables, like @1.timestamp, @3.text, @5. The rule's
location struct itself may be referred to as either d_loc_ or
@@.
Specifies a user-defined token location type. If %ltype is used, typename should be the name of an alternate (predefined) type (e.g., size_t). It should not be used if a %locationstruct specification is defined (see below). Within the parser class, this type is available as the type `LTYPE_'. All text on the line following %ltype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
Define all of the code generated by bisonc++ in the namespace namespace. By default no namespace is defined. If this directive is used the implementation header is provided with a commented out using namespace declaration for the specified namespace. In addition, the parser and parser base class header files also use the specified namespace to define their include guard directives.
It is an error if this directive is used and an already existing parser-class header file and/or implementation header file does not define namespace identifier.
Do not generate warnings when zero- or negative dollar-indices are used in the grammar's action blocks. Zero or negative dollar-indices are commonly used to implement inherited attributes, and should normally be avoided. When used, they can be specified like $-1, or like $<type>-1, where type is empty; an STYPE_ tag; or a field-name. However, note that in combination with the %polymorphic directive (see below) only the $-i format can be used.
By default #line preprocessor directives are inserted just before action statements in the file containing the parser's parse function. These directives are suppressed by the %no-lines directive.
Defines the names of symbolic terminal tokens that should be treated as non-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precede