The noweb Hacker's Guide

Norman Ramsey [Author's current address is Department of Computer Science, Tufts University, Medford, MA 02155, USA; send email to nr@cs.tufts.edu.]
Department of Computer Science
Princeton University
September 1992
(Revised August 1994, December 1997)

Abstract

Noweb is unique among literate-programming tools in its pipelined architecture, which makes it easy for users to change its behavior or to add new features, without even recompiling. This guide describes the representation used in the pipeline and the behavior of the existing pipeline stages. Ordinary users will find nothing of interest here; the guide is addressed to those who want to change or extend noweb.


Table of Contents


Introduction

[cite ramsey:simplified] describes noweb from a user's point of view, showing its simplicity and examples of its use. The noweb tools are implemented as pipelines. Each pipeline begins with the noweb source file. Successive stages of the pipeline implement simple transformations of the source, until the desired result emerges from the end of the pipeline. Figures [->] and [->] on page [->] show pipelines for notangle and noweave. Pipelines are responsible for noweb's extensibility, which enables its users to create new literate-programming features without having to write their own tools. This document explains how to change or extend noweb by inserting or removing pipeline stages. Readers should be familiar with the noweb man pages, which describe the structure of noweb source files.

Markup, which is the first stage in every pipeline, converts noweb source to a representation easily manipulated by common Unix tools like sed and awk, simplifying the construction of later pipeline stages. Middle stages add information to the representation. notangle's final stage converts to code; noweave's final stages convert to TeX, LaTeX or HTML. Middle stages are called filters, by analogy with Unix filters. Final stages are called back ends, by analogy with compilers—they don't transform noweb's intermediate representation; they emit something else.

The pipeline representation

In the pipeline, every line begins with an at sign and one of the keywords shown in Table [->]. The structural keywords represent the noweb source syntax directly. They must appear in particular orders that reflect the structure of the source. The tagging keywords can be inserted essentially anywhere (within reason), and with some exceptions, they are not generated by markup. The wrapper keywords mark the beginning and end of file, and they carry information about what formatters are supposed to do in the way of leading and trailing boilerplate. They are used by noweave but not by notangle, and they are inserted directly by the noweave shell script, not by markup.


Structural keywords
@begin kind nStart a chunk
@end kind nEnd a chunk
@text stringstring appeared in a chunk
@nlA newline appeared in a chunk
@defn nameThe code chunk named name is being defined
@use nameA reference to code chunk named name
@quoteStart of quoted code in a documentation chunk
@endquoteEnd of quoted code in a documentation chunk
Tagging keywords
@file filenameName of the file from which the chunks came
@line nNext text line came from source line n in current file
@language languageProgramming language in which code is written
@index ...Index information.
@xref ...Cross-reference information.

Wrapper keywords
@header formatter options First line, identifying formatter and options
@trailer formatterLast line, identifying formatter.
Error keyword
@fatal stagename message A fatal error has occurred.
Lying, cheating, stealing keyword
@literal text Copy text to output.

Keywords used in noweb's pipeline representation [*]


Structural keywords

The structural keywords represent the chunks in the noweb source. Each chunk is bracketed by a @begin... @end pair, and the kind of chunk is either docs or code. The @begin and @end are numbered; within a single file, numbers must be monotonically increasing, but they need not be consecutive. Filters may change chunk numbers at will.

Depending on its kind, a chunk may contain documentation or code. Documentation may contain text and newlines, represented by @text and @nl. It may also contain quoted code bracketed by @quote... @endquote. Every @quote must be terminated by an @endquote within the same chunk. Quoted code corresponds to the [[...]] construct in the noweb source.

Code, whether it appears in quoted code or in a code chunk, may contain text and newlines, and also definitions and uses of code chunks, marked with @defn and @use. The first structural keyword in any code chunk must be @defn. @defn may be preceded or followed by tagging keywords, but the next structural keyword must be @nl; together, the @defn and @nl represent the initial <<chunk name>>= that starts the chunk (including the terminating newline).

A few facts follow from what's already stated above, but are probably worth noting explicitly:

Tagging keywords

The structural keywords carry all the code and documentation that appears in a noweb source file. The tagging keywords carry information about that code or documentation. The @file keyword carries the name of the source file from which the following lines come. The @line keyword give the line number of the next @text line within the current file (as determined by the most recent @file keyword). The only guarantee about where these appear is that markup introduces each new source file with a @file that appears between chunks. Most filters ignore @file and @line, but nt respects them, so that notangle can properly mark line numbers if some noweb filter starts moving lines around.

Programming languages

To support automatic indexing or prettyprinting, it's possible to indicate the programming language in which a chunk is written. The @language keyword may appear at most once between each @begin code and @end code pair. Standard values of @language and their associated meanings are:

awkawk
cC
c++C++
camlCAML
htmlHTML
iconIcon
latexLaTeX source
lispLisp or Scheme
makeA Makefile
m3Modula-3
ocamlObjective CAML
perlA perl script
pythonPython
shA shell script
smlStandard ML
texplain TeX
tcltcl
If the @language keyword catches on, it may be useful to create an automatic registry on the World-Wide Web.

I have made it impossible to place @language information directly in a noweb source file. My intent is that tools will identify the language of the root chunks using any of several methods: conventional names of chunks, being told on a command line, or identifying the language by looking at the content of the chunks. (Of these methods, the most practical is to name the root chunks after the files to which they will be extracted, and to use the same naming conventions as make to figure out what the contents are.) A noweb filter will tag non-root chunks with the appropriate @language by propagating information from uses to definitions.

Indexing and cross-reference concepts

The index and cross-reference commands use labels, idents, and tags. A label is a unique string generated to refer to some element of a literate program. They serve as labels or ``anchor points'' for back ends that are capable of implementing their own cross-reference. So, for example, the LaTeX back end uses labels as arguments to \label and \ref, and the HTML back end uses labels to name and refer to anchors. Labels never contain white space, which simplifies parsing. The standard filters cross-reference at the chunk level, so that each label refers to a particular code chunk, and all references to that chunk use the same label.

An ident refers to a source-language identifier. Noweb's concept of identifier is general; an identifier is an arbitrary string. It can even contain whitespace. Identifiers are used as keys in the index; references to the same string are assumed to denote the same identifier.

Tags are the strings used to identify components for cross-reference in the final document. For example, Classic WEB uses consecutive ``section numbers'' to refer to chunks. Noweb, by default, uses ``sub-page references,'' e.g., ``24b'' for the second chunk appearing on page 24. The HTML back end doesn't use any tags at all; instead, it implements cross-referencing using the ``hot link'' mechanism.

The final step of cross-referencing involves generating tags and associating a tag with each label. All the existing back ends rely on a document formatter to do this job, but that strategy might be worth changing. Computing tags within a noweb filter could be lots easier than doing it in a formatter. For example, a filter that computed sub-page numbers by grubbing in .aux files would be pretty easy to write, and it would eliminate a lot of squirrely LaTeX code.

Index information

I've divided the index keywords into several groups. There seems to be a plethora of keywords, but most of them are straightforward representations of parts of a document produced by noweave. Readers may want to have a sample of noweave's output handy when studying this and the next section.


Definitions, uses, and @ %def
@index defn identThe current chunk contains a definition of ident
@index localdefn identThe current chunk contains a definition of ident, which is not to be visible outside this file
@index use identThe current chunk contains a use of ident
@index nlA newline that is part of markup, not part of the chunk
Identifiers defined in a chunk
@index begindefsStart list of identifiers defined in this chunk
@index isused label The identifier named in the following @index defitem is used in the chunk labelled by label
@index defitem identident is defined in this chunk, and it is used in all the chunks named in the immediately preceding @index isused.
@index enddefsEnd list of identifiers defined in this chunk
Identifiers used in a chunk
@index beginusesStart list of identifiers used in this chunk
@index isdefined label The identifier named in the following @index useitem is defined in the chunk labelled by label
@index useitem identident is used in this chunk, and it is defined in each of the chunks named in the immediately preceding @index isdefined.
@index endusesEnd list of identifiers used in this chunk

The index of identifiers
@index beginindexStart of the index of identifiers
@index entrybegin label ident Beginning of the entry for ident, whose first definition is found at label
@index entryuse label A use of the identifer named in the last @index entrybegin occurs at the chunk labelled with label.
@index entrydefn label A definition of the identifer named in the last @index entrybegin occurs at the chunk labelled with label.
@index entryend End of the entry started by the last @index entrybegin
@index endindexEnd of the index of identifiers
Indexing keywords [*]


Definitions, uses, and @ %def

@index defn, @index use, and @index nl are the only @index keywords that appear in markup's output, and thus which can appear in any program. They may appear only within the boundaries of a code chunk (@begin code... @end code). @index defn and @index use simply indicate that the current chunk contains a definition or use of the identifier ident which follows the keyword. The placement of @index defn need not bear a relationship to the text of the definition, but @index use is normally followed by a @text that contains the source-code text identified as the use. [This property can't hold when one identifier is a prefix of another; see the description of finduses on page [->].]

Instances of @index defn normally come from one of two sources: either a language-dependent recognizer of definitions, or a hand-written @ %def line. [The @ `%def notation has been deprecated since version 2.10.] In the latter case, the line is terminated by a newline that is neither part of a code chunk nor part of a documentation chunk. To keep line numbers accurate, that newline can't just be abandoned, but neither can it be represented by @nl in a documentation or code chunk. The solution is the @index nl keyword, which serves no purpose other than to keep track of these newlines, so that back ends can produce accurate line numbers.

Following a suggestion by Oren Ben-Kiki, @index localdefn indicates a definition that is not to be visible outside the current file. It may be produced by a language-dependent recognizer or other filter. Because I have questions about the need for @index localdefn, there is officially no way to cause markup to produce it.

Identifiers defined in a chunk

The keywords from @index begindefs to @index enddefs are used to represent a more complex data structure giving the list of identifiers defined in a code chunk. The constellation represents a list of identifiers; one @index defitem appears for each identifier. The group also tells in what other chunks each identifier is used; those chunks are listed by @index isused keywords which appear just before @index defitem. The labels in these keywords appear in the order of the corresponding code chunks, and there are no duplicates.

These keywords can appear anywhere inside a code chunk, but filters are encouraged to keep these keywords together. The standard filters guarantee that only @index isused and @index defitem appear between @index begindefs and @index enddefs. The standard filters put them at the end of the code chunk, which simplifies translation by the LaTeX back end, but that strategy might change in the future.

It should go without saying, but the keywords in these and all similar groups (including some @xref groups) must be properly structured. That is to say:

  1. Every @index begindefs must have a matching @index enddefs within the same code chunk.
  2. @index isused and @index defitem may appear only between matching @index begindefs and @index enddefs.
  3. The damn things can't be nested.

Identifiers used in a chunk

The keywords from @index beginuses to @index enduses are the dual of @index begindef to @index enddef; the structure lists the identifiers used in the current code chunk, with cross-references to the definitions. Similar interpretations and restrictions apply. Note that an identifier can be defined in more than one chunk, although we expect that to be an unusual event.

The index of identifiers

Keywords @index beginindex to @index endindex represent the complete index of all the identifiers used in the document. Each entry in the index is bracketed by @index entrybegin... @index entryend. An entry provides the name of the identifier, plus the labels of all the chunks in which the identifier is defined or used. The label of the first defining chunk is given at the beginning of the entry so that back ends needn't search for it.

Filters are encouraged to keep these keywords together. The standard filters put them almost at the very end of the noweb file, just before the optional @trailer.

Cross-reference information

The most basic function of the cross-referencing keywords is to associate labels and pointers (cross-references) with elements of the document, which is done with the @xref ref and @xref label keywords. The other @xref keywords all express chunk cross-reference information that is emitted directly by one or more back ends.

Chunk cross-reference introduces the idea of an anchor, which is a label that refers to an ``interesting point'' we identify with the beginning of a code chunk. The anchor is the place we expect to turn when we want to know about a code chunk; its exact value and interpretation depend on the back end being used. The standard LaTeX back end uses the sub-page number of the defining chunk as the anchor, but the standard HTML back end uses some @text from the documentation chunk preceding the code chunk.


Basic cross-reference
@xref label labelAssociates label with tagged item.
@xref ref label Cross-reference from tagged item to item associated with label.
Linking previous and next definitions of a code chunk
@xref prevdef label The @defn from the previous definition of this chunk is associated with label.
@xref nextdef label The @defn from the next definition of this chunk is associated with label.
Continued definitions of the current chunk
@xref begindefsStart ``This definition is continued in ...''
@xref defitem labelGives the label of a chunk in which the definition of the current chunk is continued.
@xref enddefsEnds the list of chunks where definition is continued.
Chunks where this code is used
@xref beginusesStart ``This code is used in ...''
@xref useitem labelGives the label of a chunk in which this chunk is used.
@xref endusesEnds the list of chunks in which this code is used.
@xref notused name Indicates that this chunk isn't used anywhere in this document.
The list of chunks
@xref beginchunksStart of the list of chunks
@xref chunkbegin label name Beginning of the entry for chunk name, whose anchor is found at label.
@xref chunkuse label The chunk is used in the chunk labelled with label.
@xref chunkdefn label The chunk is defined in the chunk labelled with label.
@xref chunkendEnd of the entry started by the last @xref chunkbegin
@xref endchunksEnd of the list of chunks
Converting labels to tags
@xref tag labe