[Top] [Contents] [Index] [ ? ]

PSPP

This file documents the PSPP package for statistical analysis of sampled data. This is edition 0.2, for PSPP version 0.2, last modified at Time-stamp: <2000-01-02 22:32:14 blp>.

1. Introduction  Description of the package.
2. Your rights and obligations  
3. Credits  Acknowledgement of authors.

4. Installing PSPP  How to compile and install PSPP.
5. Configuring PSPP  
6. Invoking PSPP  Starting and running PSPP.

7. The PSPP language  Basics of the PSPP command language.
8. Mathematical Expressions  Numeric and string expression syntax.

9. Data Input and Output  Reading data from user files.
10. System Files and Portable Files  Dealing with system & portable files.
11. Manipulating variables  Adjusting and examining variables.
12. Data transformations  Simple operations on data.
13. Selecting data for analysis  Select certain cases for analysis.
14. Conditional and Looping Constructs  Doing things many times or not at all.
15. Statistics  Basic statistical procedures.
16. Utilities  Other commands.
17. Not Implemented  What's not here yet

18. Data File Format  Format of PSPP system files.
19. Portable File Format  Format of PSPP portable files.
20. q2c Input Format  Format of syntax accepted by q2c.

21. Bugs  Known problems; submitting bug reports.

22. Function Index  Index of PSPP functions for expressions.
23. Concept Index  Index of concepts.
24. Command Index  Index of PSPP procedures.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Introduction

PSPP is a tool for statistical analysis of sampled data. It reads a syntax file and a data file, analyzes the data, and writes the results to a listing file or to standard output.

The language accepted by PSPP is similar to those accepted by SPSS statistical products. The details of PSPP's language are given later in this manual.

PSPP produces output in two forms: tables and charts. Both of these can be written in several formats; currently, ASCII, PostScript, and HTML are supported. In the future, more drivers, such as PCL and X Window System drivers, may be developed. For now, Ghostscript, available from the Free Software Foundation, may be used to convert PostScript chart output to other formats.

The current version of PSPP, 0.2, is woefully incomplete in terms of its statistical procedure support. PSPP is a work in progress. The author hopes to support fully support all features in the products that PSPP replaces, eventually. The author welcomes questions, comments, donations, and code submissions. See section Submitting Bug Reports, for instructions on contacting the author.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Your rights and obligations

Most of PSPP is distributed under the GNU General Public License. The General Public License says, in effect, that you may modify and distribute PSPP as you like, as long as you grant the same rights to others. It also states that you must provide source code when you distribute PSPP, or, if you obtained PSPP source code from an anonymous ftp site, give out the name of that site.

The General Public License is given in full in the source distribution as file `COPYING'. In Debian GNU/Linux, this file is also available as file `/usr/doc/copyright/GPL'.

To quote the GPL itself:

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Credits

I'm always embarrassed when I see an index an author has made of his own work. It's a shameless exhibition--to the trained eye. Never index your own book.

---Claire Minton, Cat's Cradle, Kurt Vonnegut, Jr.

Most of PSPP, as well as this manual (including the indices), was written by Ben Pfaff. See section 21.2 Contacting the Author, for instructions on contacting the author.

The PSPP source code incorporates julcal10 originally written by Michael A. Covington and translated into C by Jim Van Zandt. The original package can be found in directory `ftp://ftp.cdrom.com/pub/algorithms/c/julcal10'. The entire contents of that directory constitute the package. The files actually used in PSPP are julcal.c and julcal.h.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Installing PSPP

PSPP conforms to the GNU Coding Standards. PSPP is written in, and requires for proper operation, ANSI/ISO C. You might want to additionally note the following points:

Many UNIX variants should work out-of-the-box, as PSPP uses GNU autoconf to detect differences between environments. Please report any problems with compilation of PSPP under UNIX and UNIX-like operating systems--portability is a major concern of the author.

The pages below give specific instructions for installing PSPP on each type of system mentioned above.

4.1 UNIX installation  Installing on UNIX-like environments.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 UNIX installation

To install PSPP under a UNIX-like operating system, follow the steps below in order. Some of the text below was taken directly from various Free Software Foundation sources.

  1. cd to the directory containing the PSPP source.

  2. Type `./configure' to configure for your particular operating system and compiler. Running configure takes a while. While running, it displays some messages telling which features it is checking for.

    You can optionally supply some options to configure in order to give it hints about how to do its job. Type ./configure --help to see a list of options. One of the most useful options is `--with-checker', which enables the use of the Checker memory debugger under supported operating systems. Checker must already be installed to use this option. Do not use `--with-checker' if you are not debugging PSPP itself.

  3. (optional) Edit `Makefile', `config.h', and `pref.h'. These files are produced by configure. Note that most PSPP settings can be changed at runtime.

    `pref.h' is only generated by configure if it does not already exist. (It's copied from `prefh.orig'.)

  4. Type `make' to compile the package. If there are any errors during compilation, try to fix them. If modifications are necessary to compile correctly under your configuration, contact the author. See section Submitting Bug Reports, for details.

  5. Type `make check' to run self-tests on the compiled PSPP package.

  6. Become the superuser and type `make install' to install the PSPP binaries, by default in `/usr/local/bin/'. The directory `/usr/local/share/pspp/' is created and populated with files needed by PSPP at runtime. This step will also cause the PSPP documentation to be installed in `/usr/local/info/', but only if that directory already exists.

  7. (optional) Type `make clean' to delete the PSPP binaries from the source tree.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5. Configuring PSPP

PSPP has dozens of configuration possibilities and hundreds of settings. This is both a bane and a blessing. On one hand, it's possible to easily accommodate diverse ranges of setups. But, on the other, the multitude of possibilities can overwhelm the casual user. Fortunately, the configuration mechanisms are profusely described in the sections below....

5.1 Locating configuration files  How PSPP finds config files.
5.2 Configuration techniques  Many different methods of configuration....
5.3 Configuration files  How configuration files are read.
5.4 Environment variables  All about environment variables.
5.5 Output devices  Describing your terminal(s) and printer(s).
5.6 The PostScript driver class  Configuration of PostScript devices.
5.7 The ASCII driver class  Configuration of character-code devices.
5.8 The HTML driver class  Configuration for HTML output.
5.9 Miscellaneous configuration  Even more configuration variables.
5.10 Improving output quality  Hints for producing ever-more-lovely output.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 Locating configuration files

PSPP uses the same method to find most of its configuration files:

  1. The base name of the file being sought is determined.

  2. The path to search is determined.

  3. Each directory in the search path, from left to right, is searched for a file with the name of the base name. The first occurrence is read as the configuration file.

The first two steps are elaborated below for the sake of our pedantic friends.

  1. A base name is a file name lacking an absolute directory reference. Some examples of base names are: `ps-encodings', `devices', `devps/DESC' (under UNIX), `devps\DESC' (under M$ environments).

    Determining the base name is a two-step process:

    1. If the appropriate environment variable is defined, the value of that variable is used (see section 5.4 Environment variables). For instance, when searching for the output driver initialization file, the variable examined is STAT_OUTPUT_INIT_FILE.

    2. Otherwise, the compiled-in default is used. For example, when searching for the output driver initialization file, the default base name is `devices'.

    Please note: If a user-specified base name does contain an absolute directory reference, as in a file name like `/home/pfaff/fonts/TR', no path is searched--the file name is used exactly as given--and the algorithm terminates.

  2. The path is the first of the following that is defined:

As a final note: Under DOS, directories given in paths are delimited by semicolons (`;'); under UNIX, directories are delimited by colons (`:'). This corresponds with the standard path delimiter under these OSes.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 Configuration techniques

There are many ways that PSPP can be configured. These are described in the list below. Values given by earlier items take precedence over those given by later items.

  1. Syntax commands that modify settings, such as SET.

  2. Command-line options. See section 6. Invoking PSPP.

  3. PSPP-specific environment variable contents. See section 5.4 Environment variables.

  4. General environment variable contents. See section 5.4 Environment variables.

  5. Configuration file contents. See section 5.3 Configuration files.

  6. Fallback defaults.

Some of the above may not apply to a particular setting. For instance, the current pager (such as `more', `most', or `less') cannot be determined by configuration file contents because there is no appropriate configuration file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Configuration files

Most configuration files have a common form:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Environment variables

You may think the concept of environment variables is a fairly simple one. However, the author of PSPP has found a way to complicate even something so simple. Environment variables are further described in the sections below:

5.4.1 Values of environment variables  Values of variables are determined this way.
5.4.2 Environment substitutions  How environment substitutions are made.
5.4.3 Predefined environment variables  A few variables are automatically defined.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.1 Values of environment variables

Values for environment variables are obtained by the following means, which are arranged in order of decreasing precedence:

  1. Command-line options. See section 6. Invoking PSPP.

  2. The `environment' configuration file--more on this below.

  3. Actual environment variables (defined in the shell or other parent process).

The `environment' configuration file is located through application of the usual algorithm for configuration files (see section 5.1 Locating configuration files), except that its contents do not affect the search path used to find `environment' itself. Use of `environment' is discouraged on systems that allow an arbitrarily large environment; it is supported for use on systems like MS-DOS that limit environment size.

`environment' is composed of lines having the form `key=value', where key and the equals sign (`=') are required, and value is optional. If value is given, variable key is given that value; if value is absent, variable key is undefined (deleted). Variables may not be defined with a null value.

Environment substitutions are performed on each line in the file (see section 5.4.2 Environment substitutions).

See 5.3 Configuration files, for more details on formatting of the environment configuration file.

Please note: Support for `environment' is not yet implemented.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.2 Environment substitutions

Much of the power of environment variables lies in the way that they may be substituted into configuration files. Variable substitutions are described below.

The line is scanned from left to right. In this scan, all characters other than dollar signs (`$') are retained unmolested. Dollar signs, however, introduce an environment variable reference. References take three forms:

$var
Replaced by the value of environment variable var, determined as specified in 5.4.1 Values of environment variables. var must be one of the following:

${var}
Same as above, but var may contain any character (except `}').

$$
Replaced by a single dollar sign.

Undefined variables expand to a empty value.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4.3 Predefined environment variables

There are two environment variables predefined for use in environment substitutions:

`VER'
Defined as the version number of PSPP, as a string, in a format something like `0.9.4'.

`ARCH'
Defined as the host architecture of PSPP, as a string, in standard cpu-manufacturer-OS format. For instance, Debian GNU/Linux 1.1 on an Intel machine defines this as `i586-unknown-linux'. This is somewhat dependent on the system used to compile PSPP.

Nothing prevents these values from being overridden, although it's a good idea not to do so.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5 Output devices

Configuring output devices is the most complicated aspect of configuring PSPP. The output device configuration file is named `devices'. It is searched for using the usual algorithm for finding configuration files (see section 5.1 Locating configuration files). Each line in the file is read in the usual manner for configuration files (see section 5.3 Configuration files).

Lines in `devices' are divided into three categories, described briefly in the table below:

driver category definitions
Define a driver in terms of other drivers.

macro definitions
Define environment variables local to the the output driver configuration file.

device definitions
Describe the configuration of an output device.

The following sections further elaborate the contents of the `devices' file.

5.5.1 Driver categories  How to organize the driver namespace.
5.5.2 Macro definitions  Environment variables local to `devices'.
5.5.3 Driver definitions  Output device descriptions.
5.5.4 Dimensions  Lengths, widths, sizes, ....
5.5.5 Paper sizes  Letter, legal, A4, envelope, ....
5.5.6 How lines are divided into types  Details on `devices' parsing.
5.5.7 How lines are divided into tokens  Dividing `devices' lines into tokens.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.1 Driver categories

Drivers can be divided into categories. Drivers are specified by their names, or by the names of the categories that they are contained in. Only certain drivers are enabled each time PSPP is run; by default, these are the drivers in the category `default'. To enable a different set of drivers, use the `-o device' command-line option (see section 6. Invoking PSPP).

Categories are specified with a line of the form `category=driver1 driver2 driver3 ... drivern'. This line specifies that the category category is composed of drivers named driver1, driver2, and so on. There may be any number of drivers in the category, from zero on up.

Categories may also be specified on the command line (see section 6. Invoking PSPP).

This is all you need to know about categories. If you're still curious, read on.

First of all, the term `categories' is a bit of a misnomer. In fact, the internal representation is nothing like the hierarchy that the term seems to imply: a linear list is used to keep track of the enabled drivers.

When PSPP first begins reading `devices', this list contains the name of any drivers or categories specified on the command line, or the single item `default' if none were specified.

Each time a category definition is specified, the list is searched for an item with the value of category. If a matching item is found, it is deleted. If there was a match, the list of drivers (driver1 through drivern) is then appended to the list.

Each time a driver definition line is encountered, the list is searched. If the list contains an item with that driver's name, the driver is enabled and the item is deleted from the list. Otherwise, the driver is not enabled.

It is an error if the list is not empty when the end of `devices' is reached.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.2 Macro definitions

Macro definitions take the form `define macroname definition'. In such a macro definition, the environment variable macroname is defined to expand to the value definition. Before the definition is made, however, any macros used in definition are expanded.

Please note the following nuances of macro usage:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.3 Driver definitions

Driver definitions are the ultimate purpose of the `devices' configuration file. These are where the real action is. Driver definitions tell PSPP where it should send its output.

Each driver definition line is divided into four fields. These fields are delimited by colons (`:'). Each line is subjected to environment variable interpolation before it is processed further (see section 5.4.2 Environment substitutions). From left to right, the four fields are, in brief:

driver name
A unique identifier, used to determine whether to enable the driver.

class name
One of the predefined driver classes supported by PSPP. The currently supported driver classes include `postscript' and `ascii'.

device type(s)
Zero or more of the following keywords, delimited by spaces:

screen

Indicates that the device is a screen display. This may reduce the amount of buffering done by the driver, to make interactive use more convenient.

printer

Indicates that the device is a printer.

listing

Indicates that the device is a listing file.

These options are just hints to PSPP and do not cause the output to be directed to the screen, or to the printer, or to a listing file--those must be set elsewhere in the options. They are used primarily to decide which devices should be enabled at any given time. See section 16.10 SET, for more information.

options
An optional set of options to pass to the driver itself. The exact format for the options varies among drivers.

The driver is enabled if:

  1. Its driver name is specified on the command line, or

  2. It's in a category specified on the command line, or

  3. If no categories or driver names are specified on the command line, it is in category default.

For more information on driver names, see 5.5.1 Driver categories.

The class name must be one of those supported by PSPP. The classes supported depend on the options with which PSPP was compiled. See later sections in this chapter for descriptions of the available driver classes.

Options are dependent on the driver. See the driver descriptions for details.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.4 Dimensions

Quite often in configuration it is necessary to specify a length or a size. PSPP uses a common syntax for all such, calling them collectively by the name dimensions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.5 Paper sizes

Output drivers usually deal with some sort of hardcopy media. This media is called paper by the drivers, though in reality it could be a transparency or film or thinly veiled sarcasm. To make it easier for you to deal with paper, PSPP allows you to have (of course!) a configuration file that gives symbolic names, like "letter" or "legal" or "a4", to paper sizes, rather than forcing you to use cryptic numbers like "8-1/2 x 11" or "210 by 297". Surprisingly enough, this configuration file is named `papersize'. See section 5.3 Configuration files.

When PSPP tries to connect a symbolic paper name to a paper size, it reads and parses each non-comment line in the file, in order. The first field on each line must be a symbolic paper name in double quotes. Paper names may not contain double quotes. Paper names are not case-sensitive: `legal' and `Legal' are equivalent.

If a match is found for the paper name, the rest of the line is parsed. If it is found to be a pair of dimensions (see section 5.5.4 Dimensions) separated by either `x' or `by', then those are taken to be the paper size, in order of width followed by length. There must be at least one space on each side of `x' or `by'.

Otherwise the line must be of the form `"paper-1"="paper-2"'. In this case the target of the search becomes paper name paper-2 and the search through the file continues.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.6 How lines are divided into types

The lines in `devices' are distinguished in the following manner:

  1. Leading whitespace is removed.

  2. If the resulting line begins with the exact string define, followed by one or more whitespace characters, the line is processed as a macro definition.

  3. Otherwise, the line is scanned for the first instance of a colon (`:') or an equals sign (`=').

  4. If a colon is encountered first, the line is processed as a driver definition.

  5. Otherwise, if an equals sign is encountered, the line is processed as a macro definition.

  6. Otherwise, the line is ill-formed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.5.7 How lines are divided into tokens

Each driver definition line is run through a simple tokenizer. This tokenizer recognizes two basic types of tokens.

The first type is an equals sign (`='). Equals signs are both delimiters between tokens and tokens in themselves.

The second type is an identifier or string token. Identifiers and strings are equivalent after tokenization, though they are written differently. An identifier is any string of characters other than whitespace or equals sign.

A string is introduced by a single- or double-quote character (`'' or `"') and, in general, continues until the next occurrence of that same character. The following standard C escapes can also be embedded within strings:

\'
A single-quote (`'').

\"
A double-quote (`"').

\?
A question mark (`?'). Included for hysterical raisins.

\\
A backslash (`\').

\a
Audio bell (ASCII 7).

\b
Backspace (ASCII 8).

\f
Formfeed (ASCII 12).

\n
Newline (ASCII 10)

\r
Carriage return (ASCII 13).

\t
Tab (ASCII 9).

\v
Vertical tab (ASCII 11).

\ooo
Each `o' must be an octal digit. The character is the one having the octal value specified. Any number of octal digits is read and interpreted; only the lower 8 bits are used.

\xhh
Each `h' must be a hex digit. The character is the one having the hexadecimal value specified. Any number of hex digits is read and interpreted; only the lower 8 bits are used.

Tokens, outside of quoted strings, are delimited by whitespace or equals signs.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6 The PostScript driver class

The postscript driver class is used to produce output that is acceptable to PostScript printers and to PC-based PostScript interpreters such as Ghostscript. Continuing a long tradition, PSPP's PostScript driver is configurable to the point of absurdity.

There are actually two PostScript drivers. The first one, `postscript', produces ordinary DSC-compliant PostScript output. The second one `epsf', produces an Encapsulated PostScript file. The two drivers are otherwise identical in configuration and in operation.

The PostScript driver is described in further detail below.

5.6.1 PostScript output options  Output file options.
5.6.2 PostScript page options  Paper, margins, scaling & rotation, more!
5.6.3 PostScript file options  Configuration files.
5.6.4 PostScript font options  Default fonts, font options.
5.6.5 PostScript line options  Line widths, options.
5.6.6 The PostScript prologue  Details on the PostScript prologue.
5.6.7 PostScript encodings  Details on PostScript font encodings.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.1 PostScript output options

These options deal with the form of the output and the output file itself:

output-file=filename

File to which output should be sent. This can be an ordinary filename (i.e., "pspp.ps"), a pipe filename (i.e., "|lpr"), or stdout ("-"). Default: "pspp.ps".

color=boolean

Most of the time black-and-white PostScript devices are smart enough to map colors to shades themselves. However, you can cause the PSPP output driver to do an ugly simulation of this in its own driver by turning color off. Default: on.

This is a boolean setting, as are many settings in the PostScript driver. Valid positive boolean values are `on', `true', `yes', and nonzero integers. Negative boolean values are `off', `false', `no', and zero.

data=data-type

One of clean7bit, clean8bit, or binary. This controls what characters will be written to the output file. PostScript produced with clean7bit can be transmitted over 7-bit transmission channels that use ASCII control characters for line control. clean8bit is similar but allows characters above 127 to be written to the output file. binary allows any character in the output file. Default: clean7bit.

line-ends=line-end-type

One of cr, lf, or crlf. This controls what is used for newline in the output file. Default: cr.

optimize-line-size=level

Either 0 or 1. If level is 1, then short line segments will be collected and merged into longer ones. This reduces output file size but requires more time and memory. A level of 0 has the advantage of being better for interactive environments. 1 is the default unless the screen flag is set; in that case, the default is 0.

optimize-text-size=level

One of 0, 1, or 2, each higher level representing correspondingly more aggressive space savings for text in the output file and requiring correspondingly more time and memory. Unfortunately the levels presently are all the same. 1 is the default unless the screen flag is set; in that case, the default is 0.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.2 PostScript page options

These options affect page setup:

headers=boolean

Controls whether the standard headers showing the time and date and title and subtitle are printed at the top of each page. Default: on.

paper-size=paper-size

Paper size, either as a symbolic name (i.e., letter or a4) or specific measurements (i.e., 8-1/2x11 or "210 x 297". See section Paper sizes. Default: letter.

orientation=orientation

Either portrait or landscape. Default: portrait.

left-margin=dimension
right-margin=dimension
top-margin=dimension
bottom-margin=dimension

Sets the margins around the page. The headers, if enabled, are not included in the margins; they are in addition to the margins. For a description of dimensions, see 5.5.4 Dimensions. Default: 0.5in.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.3 PostScript file options

Oh, my. You don't really want to know about the way that the PostScript driver deals with files, do you? Well I suppose you're entitled, but I warn you right now: it's not pretty. Here goes....

First let's look at the options that are available:

font-dir=font-directory

Sets the font directory. Default: devps.

prologue-file=prologue-file-name

Sets the name of the PostScript prologue file. You can write your own prologue, though I have no idea why you'd want to: see 5.6.6 The PostScript prologue. Default: ps-prologue.

device-file=device-file-name

Sets the name of the Groff-format device description file. The PostScript driver reads this in order to know about the scaling of fonts and so on. The format of such files is described in groff_font(5), included with Groff. Default: DESC.

encoding-file=encoding-file-name

Sets the name of the encoding file. This file contains a list of all font encodings that will be needed so that the driver can put all of them at the top of the prologue. See section 5.6.7 PostScript encodings. Default: ps-encodings.

If the specified encoding file cannot be found, this error will be silently ignored, since most people do not need any encodings besides the ones that can be found using auto-encodings, described below.

auto-encode=boolean

When enabled, the font encodings needed by the default proportional- and fixed-pitch fonts will automatically be dumped to the PostScript output. Otherwise, it is assumed that the user has an encoding file and knows how to use it (see section 5.6.7 PostScript encodings). There is probably no good reason to turn off this convenient feature. Default: on.

Next I suppose it's time to describe the search algorithm. When the PostScript driver needs a file, whether that file be a font, a PostScript prologue, or what you will, it searches in this manner:

  1. Constructs a path by taking the first of the following that is defined:

    1. Environment variable STAT_GROFF_FONT_PATH. See section 5.4 Environment variables.

    2. Environment variable GROFF_FONT_PATH.

    3. The compiled-in fallback default.

  2. Constructs a base name from concatenating, in order, the font directory, a path separator (`/' or `\'), and the file to be found. A typical base name would be something like devps/ps-encodings.

  3. Searches for the base name in the path constructed above. If the file is found, the algorithm terminates.

  4. Searches for the base name in the standard configuration path. See 5.1 Locating configuration files, for more details. If the file is found, the algorithm terminates.

  5. At this point we remove the font directory and path separator from the base name. Now the base name is simply the file to be found, i.e., ps-encodings.

  6. Searches for the base name in the path constructed in the first step. If the file is found, the algorithm terminates.

  7. Searches for the base name in the standard configuration path. If the file is found, the algorithm terminates.

  8. The algorithm terminates unsuccessfully.

So, as you see, there are several ways to configure the PostScript drivers. Careful selection of techniques can make the configuration very flexible indeed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.4 PostScript font options

The list of available font options is short and sweet:

prop-font=font-name

Sets the default proportional font. The name should be that of a PostScript font. Default: "Helvetica".

fixed-font=font-name

Sets the default fixed-pitch font. The name should be that of a PostScript font. Default: "Courier".

font-size=font-size

Sets the size of the default fonts, in thousandths of a point. Default: 10000.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.5 PostScript line options

Most tables contain lines, or rules, between cells. Some features of the way that lines are drawn in PostScript tables are user-definable:

line-style=style

Sets the style used for lines used to divide tables into sections. style must be either thick, in which case thick lines are used, or double, in which case double lines are used. Default: thick.

line-gutter=dimension

Sets the line gutter, which is the amount of whitespace on either side of lines that border text or graphics objects. See section 5.5.4 Dimensions. Default: 0.5pt.

line-spacing=dimension

Sets the line spacing, which is the amount of whitespace that separates lines that are side by side, as in a double line. Default: 0.5pt.

line-width=dimension

Sets the width of a typical line used in tables. Default: 0.5pt.

line-width-thick=dimension

Sets the width of a thick line used in tables. Not used if line-style is set to thick. Default: 1.5pt.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.6 The PostScript prologue

Most PostScript files that are generated mechanically by programs consist of two parts: a prologue and a body. The prologue is generally a collection of boilerplate. Only the body differs greatly between two outputs from the same program. This is also the strategy used in the PSPP PostScript driver. In general, the prologue supplied with PSPP will be more than sufficient. In this case, you will not need to read the rest of this section. However, hackers might want to know more. Read on, if you fall into this category.

The prologue is dumped into the output stream essentially unmodified. However, two actions are performed on its lines. First, certain lines may be omitted as specified in the prologue file itself. Second, variables are substituted.

The following lines are omitted:

  1. All lines that contain three bangs in a row (!!!).

  2. Lines that contain !eps, if the PostScript driver is producing ordinary PostScript output. Otherwise an EPS file is being produced, and the line is included in the output, although everything following !eps is deleted.

  3. Lines that contain !ps, if the PostScript driver is producing EPS output. Otherwise, ordinary PostScript is being produced, and the line is included in the output, although everything following !ps is deleted.

The following are the variables that are substituted. Only the variables listed are substituted; environment variables are not. See section 5.4.2 Environment substitutions.

bounding-box

The page bounding box, in points, as four space-separated numbers. For U.S. letter size paper, this is `0 0 612 792'.

creator

PSPP version as a string: `GNU PSPP 0.1b', for example.

date

Date the file was created. Example: `Tue May 21 13:46:22 1991'.

data

Value of the data PostScript driver option, as one of the strings `Clean7Bit', `Clean8Bit', or `Binary'.

orientation

Page orientation, as one of the strings Portrait or Landscape.

user

Under multiuser OSes, the user's login name, taken either from the environment variable LOGNAME or, if that fails, the result of the C library function getlogin(). Defaults to `nobody'.

host

System hostname as reported by gethostname(). Defaults to `nowhere'.

prop-font

Name of the default proportional font, prefixed by the word `font' and a space. Example: `font Times-Roman'.

fixed-font

Name of the default fixed-pitch font, prefixed by the word `font' and a space.

scale-factor

The page scaling factor as a floating-point number. Example: 1.0. Note that this is also passed as an argument to the BP macro.

paper-length
paper-width

The paper length and paper width, respectively, in thousandths of a point. Note that these are also passed as arguments to the BP macro.

left-margin
top-margin

The left margin and top margin, respectively, in thousandths of a point. Note that these are also passed as arguments to the BP macro.

title

Document title as a string. This is not the title specified in the PSPP syntax file. A typical title is the word `PSPP' followed by the syntax file name in parentheses. Example: `PSPP (<stdin>)'.

source-file

PSPP syntax file name. Example: `mary96/first.stat'.

Any other questions about the PostScript prologue can best be answered by examining the default prologue or the PSPP source.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.7 PostScript encodings

PostScript fonts often contain many more than 256 characters, in order to accommodate foreign language characters and special symbols. PostScript uses encodings to map these onto single-byte symbol sets. Each font can have many different encodings applied to it.

PSPP's PostScript driver needs to know which encoding to apply to each font. It can determine this from the information encapsulated in the Groff font description that it reads. However, there is an additional problem--for efficiency, the PostScript driver needs to have a complete list of all encodings that will be used in the entire session when it opens the output file. For this reason, it can't use the information built into the fonts because it doesn't know which fonts will be used.

As a stopgap solution, there are two mechanisms for specifying which encodings will be used. The first mechanism is automatic and it is the only one that most PSPP users will ever need. The second mechanism is manual, but it is more flexible. Either mechanism or both may be used at one time.

The first mechanism is activated by the `auto-encode' driver option (see section 5.6.3 PostScript file options). When enabled, `auto-encode' causes the PostScript driver to include the encodings used by the default proportional and fixed-pitch fonts (see section 5.6.4 PostScript font options). Many PSPP output files will only need these encodings.

The second mechanism is the file specified by the `encoding-file' option (see section 5.6.3 PostScript file options). If it exists, this file must consist of lines in PSPP configuration-file format (see section 5.3 Configuration files). Each line that is not a comment should name a PostScript encoding to include in the output.

It is not an error if an encoding is included more than once, by either mechanism. It will appear only once in the output. It is also not an error if an encoding is included in the output but never used. It is an error if an encoding is used but not included by one of these mechanisms. In this case, the built-in PostScript encoding `ISOLatin1Encoding' is substituted.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7 The ASCII driver class

The ASCII driver class produces output that can be displayed on a terminal or output to printers. All of its options are highly configurable. The ASCII driver has class name `ascii'.

The ASCII driver is described in further detail below.

5.7.1 ASCII output options  Output file options.
5.7.2 ASCII page options  Page size, margins, more.
5.7.3 ASCII font options  Box character, bold & italics.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7.1 ASCII output options

output-file=filename

File to which output should be sent. This can be an ordinary filename (i.e., "pspp.ps"), a pipe filename (i.e., "|lpr"), or stdout ("-"). Default: "pspp.list".

char-set=char-set-type

One of `ascii' or `latin1'. This has no effect on output at the present time. Default: ascii.

form-feed-string=form-feed-value

The string written to the output to cause a formfeed. See also paginate, described below, for a related setting. Default: "\f".

newline-string=newline-value

The string written to the output to cause a newline (carriage return plus linefeed). The default, which can be specified explicitly with newline-string=default, is to use the system-dependent newline sequence by opening the output file in text mode. This is usually the right choice.

However, newline-string can be set to any string. When this is done, the output file is opened in binary mode.

paginate=boolean

If set, a formfeed (as set in form-feed-string, described above) will be written to the device after every page. Default: on.

tab-width=tab-width-value

The distance between tab stops for this device. If set to 0, tabs will not be used in the output. Default: 8.

init=initialization-string.

String written to the device before anything else, at the beginning of the output. Default: "" (the empty string).

done=finalization-string.

String written to the device after everything else, at the end of the output. Default: "" (the empty string).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7.2 ASCII page options

These options affect page setup:

headers=boolean

If enabled, two lines of header information giving title and subtitle, page number, date and time, and PSPP version are printed at the top of every page. These two lines are in addition to any top margin requested. Default: on.

length=line-count

Physical length of a page, in lines. Headers and margins are subtracted from this value. Default: 66.

width=character-count

Physical width of a page, in characters. Margins are subtracted from this value. Default: 130.

lpi=lines-per-inch

Number of lines per vertical inch. Not currently used. Default: 6.

cpi=characters-per-inch

Number of characters per horizontal inch. Not currently used. Default: 10.

left-margin=left-margin-width

Width of the left margin, in characters. PSPP subtracts this value from the page width. Default: 0.

right-margin=right-margin-width

Width of the right margin, in characters. PSPP subtracts this value from the page width. Default: 0.

top-margin=top-margin-lines

Length of the top margin, in lines. PSPP subtracts this value from the page length. Default: 2.

bottom-margin=bottom-margin-lines

Length of the bottom margin, in lines. PSPP subtracts this value from the page length. Default: 2.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.7.3 ASCII font options

These are the ASCII font options:

box[line-type]=box-chars

The characters used for lines in tables produced by the ASCII driver can be changed using this option. line-type is used to indicate which type of line to change; box-chars is the character or string of characters to use for this type of line.

line-type must be a 4-digit number in base 4. The digits are in the order `right', `bottom', `left', `top'. The four possibilities for each digit are:

0
No line.

1
Single line.

2
Double line.

3
Special device-defined line, if one is available; otherwise, a double line.

Examples:

box[0101]="|"

Sets `|' as the character to use for a single-width line with bottom and top components.

box[2222]="#"

Sets `#' as the character to use for the intersection of four double-width lines, one each from the top, bottom, left and right.

box[1100]="\xda"

Sets `"\xda"', which under MS-DOG is a box character suitable for the top-left corner of a box, as the character for the intersection of two single-width lines, one each from the right and bottom.

Defaults:

italic-on=italic-on-string

Character sequence written to turn on italics or underline printing. If this is set to overstrike, then the driver will simulate underlining by overstriking with underscore characters (`_') in the manner described by overstrike-style and carriage-return-style. Default: overstrike.

italic-off=italic-off-string

Character sequence to turn off italics or underline printing. Default: "" (the empty string).

bold-on=bold-on-string

Character sequence written to turn on bold or emphasized printing. If set to overstrike, then the driver will simulated bold printing by overstriking characters in the manner described by overstrike-style and carriage-return-style. Default: overstrike.

bold-off=bold-off-string

Character sequence to turn off bold or emphasized printing. Default: "" (the empty string).

bold-italic-on=bold-italic-on-string

Character sequence written to turn on bold-italic printing. If set to overstrike, then the driver will simulate bold-italics by overstriking twice, once with the character, a second time with an underscore (`_') character, in the manner described by overstrike-style and carriage-return-style. Default: overstrike.

bold-italic-off=bold-italic-off-string

Character sequence to turn off bold-italic printing. Default: "" (the empty string).

overstrike-style=overstrike-option

Either single or line:

single is recommended for use with ttys and programs that understand overstriking in text files, such as the pager less. single will also work with printer devices but results in rapid back-and-forth motions of the printhead that can cause the printer to physically overheat!

line is recommended for use with printer devices. Most programs that understand overstriking in text files will not properly deal with line mode.

Default: single.

carriage-return-style=carriage-return-type

Either bs or cr. This option applies only when one or more of the font commands is set to overstrike and, at the same time, overstrike-style is set to line.

Although cr is preferred as being more compact, bs is more general since some devices do not interpret carriage returns in the desired manner. Default: bs.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.8 The HTML driver class

The html driver class is used to produce output for viewing in tables-capable web browsers such as Emacs' w3-mode. Its configuration is very simple. Currently, the output has a very plain format. In the future, further work may be done on improving the output appearance.

There are few options for use with the html driver class:

output-file=filename

File to which output should be sent. This can be an ordinary filename (i.e., "pspp.ps"), a pipe filename (i.e., "|lpr"), or stdout ("-"). Default: "pspp.html".

prologue-file=prologue-file-name

Sets the name of the PostScript prologue file. You can write your own prologue if you want to customize colors or other settings: see 5.8.1 The HTML prologue. Default: html-prologue.

5.8.1 The HTML prologue  Format of the HTML prologue file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.8.1 The HTML prologue

HTML files that are generated by PSPP consist of two parts: a prologue and a body. The prologue is a collection of boilerplate. Only the body differs greatly between two outputs. You can tune the colors and other attributes of the output by editing the prologue. The prologue is dumped into the output stream essentially unmodified. However, two actions are performed on its lines. First, certain lines may be omitted as specified in the prologue file itself. Second, variables are substituted.

The following lines are omitted:

  1. All lines that contain three bangs in a row (!!!).

  2. Lines that contain !title, if no title is set for the output. If a title is set, then the characters !title are removed before the line is output.

  3. Lines that contain !subtitle, if no subtitle is set for the output. If a subtitle is set, then the characters !subtitle are removed before the line is output.

The following are the variables that are substituted. Only the variables listed are substituted; environment variables are not. See section 5.4.2 Environment substitutions.

generator

PSPP version as a string: `GNU PSPP 0.1b', for example.

date

Date the file was created. Example: `Tue May 21 13:46:22 1991'.

user

Under multiuser OSes, the user's login name, taken either from the environment variable LOGNAME or, if that fails, the result of the C library function getlogin(). Defaults to `nobody'.

host

System hostname as reported by gethostname(). Defaults to `nowhere'.

title

Document title as a string. This is the title specified in the PSPP syntax file.

subtitle

Document subtitle as a string.

source-file

PSPP syntax file name. Example: `mary96/first.stat'.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.9 Miscellaneous configuration

The following environment variables can be used to further configure PSPP:

HOME

Used to determine the user's home directory. No default value.

STAT_INCLUDE_PATH

Path used to find include files in PSPP syntax files. Defaults vary across operating systems:

UNIX

  • `.'

  • `~/.pspp/include'

  • `/usr/local/lib/pspp/include'

  • `/usr/lib/pspp/include'

  • `/usr/local/share/pspp/include'

  • `/usr/share/pspp/include'

MS-DOS

  • `.'

  • `C:\PSPP\INCLUDE'

  • `$PATH'

Other OSes
No default path.

STAT_PAGER
PAGER

When PSPP invokes an external pager, it uses the first of these that is defined. There is a default pager only if the person who compiled PSPP defined one.

TERM

The terminal type termcap or ncurses will use, if such support was compiled into PSPP.

STAT_OUTPUT_INIT_FILE

The basename used to search for the driver definition file. See section 5.5 Output devices. See section 5.1 Locating configuration files. Default: devices.

STAT_OUTPUT_PAPERSIZE_FILE

The basename used to search for the papersize file. See section 5.5.5 Paper sizes. See section 5.1 Locating configuration files. Default: papersize.

STAT_OUTPUT_INIT_PATH

The path used to search for the driver definition file and the papersize file. See section 5.1 Locating configuration files. Default: the standard configuration path.

TMPDIR

The sort procedure stores its temporary files in this directory. Default: (UNIX) `/tmp', (MS-DOS) `\', (other OSes) empty string.

TEMP
TMP

Under MS-DOS only, these variables are consulted after TMPDIR, in this order.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.10 Improving output quality

When its drivers are set up properly, PSPP can produce output that looks very good indeed. The PostScript driver, suitably configured, can produce presentation-quality output. Here are a few guidelines for producing better-looking output, regardless of output driver. Your mileage may vary, of course, and everyone has different esthetic preferences.