Table of Contents
A “character set” is basically a mapping between bytes and glyphs and implies a certain character encoding scheme. For example, for the ISO 8859 family of character sets, an encoding of 8bit per character is used. For the Unicode character set, different character encodings may be used, UTF-8 being the most popular. In UTF-8, a character is represented using a variable number of bytes ranging from 1 to 4.
Since Mutt is a command-line tool run from a shell, and delegates
certain tasks to external tools (such as an editor for composing/editing
messages), all of these tools need to agree on a character set and
encoding. There exists no way to reliably deduce the character set a
plain text file has. Interoperability is gained by the use of
well-defined environment variables. The full set can be printed by
issuing locale on the command line.
Upon startup, Mutt determines the character set on its own using
routines that inspect locale-specific environment variables. Therefore,
it is generally not necessary to set the $charset
variable in Mutt. It may even be counter-productive as Mutt uses system
and library functions that derive the character set themselves and on
which Mutt has no influence. It's safest to let Mutt work out the locale
setup itself.
If you happen to work with several character sets on a regular basis, it's highly advisable to use Unicode and an UTF-8 locale. Unicode can represent nearly all characters in a message at the same time. When not using a Unicode locale, it may happen that you receive messages with characters not representable in your locale. When displaying such a message, or replying to or forwarding it, information may get lost possibly rendering the message unusable (not only for you but also for the recipient, this breakage is not reversible as lost information cannot be guessed).
A Unicode locale makes all conversions superfluous which eliminates the risk of conversion errors. It also eliminates potentially wrong expectations about the character set between Mutt and external programs.
The terminal emulator used also must be properly configured for the current locale. Terminal emulators usually do not derive the locale from environment variables, they need to be configured separately. If the terminal is incorrectly configured, Mutt may display random and unexpected characters (question marks, octal codes, or just random glyphs), format strings may not work as expected, you may not be abled to enter non-ascii characters, and possible more. Data is always represented using bytes and so a correct setup is very important as to the machine, all character sets “look” the same.
Warning: A mismatch between what system and library functions think the
locale is and what Mutt was told what the locale is may make it behave
badly with non-ascii input: it will fail at seemingly random places.
This warning is to be taken seriously since not only local mail handling
may suffer: sent messages may carry wrong character set information the
receiver has too deal with. The need to set
$charset directly in most cases points at terminal
and environment variable setup problems, not Mutt problems.
A list of officially assigned and known character sets can be found at
IANA,
a list of locally supported locales can be obtained by running
locale -a.
All string patterns in Mutt including those in more complex patterns must be specified using regular expressions (regexp) in the “POSIX extended” syntax (which is more or less the syntax used by egrep and GNU awk). For your convenience, we have included below a brief description of this syntax.
The search is case sensitive if the pattern contains at least one upper case letter, and case insensitive otherwise.
“\” must be quoted if used for a regular expression in an initialization command: “\\”.
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.
The regular expression can be enclosed/delimited by either " or ' which is useful if the regular expression includes a white-space character. See Syntax of Initialization Files for more information on " and ' delimiter processing. To match a literal " or ' you must preface it with \ (backslash).
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.
The period “.” matches any single character. The caret “^” and the dollar sign “$” are metacharacters that respectively match the empty string at the beginning and end of a line.
A list of characters enclosed by “[” and “]” matches any single character in that list; if the first character of the list is a caret “^” then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit. A range of ASCII characters may be specified by giving the first and last characters, separated by a hyphen “-”. Most metacharacters lose their special meaning inside lists. To include a literal “]” place it first in the list. Similarly, to include a literal “^” place it anywhere but first. Finally, to include a literal hyphen “-” place it last.
Certain named classes of characters are predefined. Character classes consist of “[:”, a keyword denoting the class, and “:]”. The following classes are defined by the POSIX standard in Table 4.1, “POSIX regular expression character classes”
Table 4.1. POSIX regular expression character classes
| Character class | Description |
|---|---|
| [:alnum:] | Alphanumeric characters |
| [:alpha:] | Alphabetic characters |
| [:blank:] | Space or tab characters |
| [:cntrl:] | Control characters |
| [:digit:] | Numeric characters |
| [:graph:] | Characters that are both printable and visible. (A space is printable, but not visible, while an “a” is both) |
| [:lower:] | Lower-case alphabetic characters |
| [:print:] | Printable characters (characters that are not control characters) |
| [:punct:] | Punctuation characters (characters that are not letter, digits, control characters, or space characters) |
| [:space:] | Space characters (such as space, tab and formfeed, to name a few) |
| [:upper:] | Upper-case alphabetic characters |
| [:xdigit:] | Characters that are hexadecimal digits |
A character class is only valid in a regular expression inside the brackets of a character list.
Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket list. For example, [[:digit:]] is equivalent to [0-9].
Two additional special sequences can appear in character lists. These apply to non-ASCII character sets, which can have single symbols (called collating elements) that are represented with more than one character, as well as several characters that are equivalent for collating or sorting purposes:
A collating symbol is a multi-character collating element enclosed in “[.” and “.]”. For example, if “ch” is a collating element, then [[.ch.]] is a regexp that matches this collating element, while [ch] is a regexp that matches either “c” or “h”.
An equivalence class is a locale-specific name for a list of characters that are equivalent. The name is enclosed in “[=” and “=]”. For example, the name “e” might be used to represent all of “e” with grave (“è”), “e” with acute (“é”) and “e”. In this case, [[=e=]] is a regexp that matches any of: “e” with grave (“è”), “e” with acute (“é”) and “e”.
A regular expression matching a single character may be followed by one of several repetition operators described in Table 4.2, “Regular expression repetition operators”.
Table 4.2. Regular expression repetition operators
| Operator | Description |
|---|---|
| ? | The preceding item is optional and matched at most once |
| * | The preceding item will be matched zero or more times |
| + | The preceding item will be matched one or more times |
| {n} | The preceding item is matched exactly n times |
| {n,} | The preceding item is matched n or more times |
| {,m} | The preceding item is matched at most m times |
| {n,m} | The preceding item is matched at least n times, but no more than m times |
Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.
Two regular expressions may be joined by the infix operator “|”; the resulting regular expression matches any string matching either subexpression.
Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.
If you compile Mutt with the included regular expression engine, the following operators may also be used in regular expressions as described in Table 4.3, “GNU regular expression extensions”.
Table 4.3. GNU regular expression extensions
| Expression | Description |
|---|---|
| \\y | Matches the empty string at either the beginning or the end of a word |
| \\B | Matches the empty string within a word |
| \\< | Matches the empty string at the beginning of a word |
| \\> | Matches the empty string at the end of a word |
| \\w | Matches any word-constituent character (letter, digit, or underscore) |
| \\W | Matches any character that is not word-constituent |
| \\` | Matches the empty string at the beginning of a buffer (string) |
| \\' | Matches the empty string at the end of a buffer |
Please note however that these operators are not defined by POSIX, so they may or may not be available in stock libraries on various systems.
Many of Mutt's commands allow you to specify a pattern to match
(limit, tag-pattern,
delete-pattern, etc.). Table 4.4, “Pattern modifiers”
shows several ways to select messages.
Table 4.4. Pattern modifiers
| Pattern modifier | Description |
|---|---|
| ~A | all messages |
| ~b EXPR | messages which contain EXPR in the message body |
| =b STRING | messages which contain STRING in the message body. If IMAP is enabled, searches for STRING on the server, rather than downloading each message and searching it locally. |
| ~B EXPR | messages which contain EXPR in the whole message |
| ~c EXPR | messages carbon-copied to EXPR |
| %c GROUP | messages carbon-copied to any member of GROUP |
| ~C EXPR | messages either to: or cc: EXPR |
| %C GROUP | messages either to: or cc: to any member of GROUP |
| ~d [MIN]-[MAX] | messages with “date-sent” in a Date range |
| ~D | deleted messages |
| ~e EXPR | messages which contains EXPR in the “Sender” field |
| %e GROUP | messages which contain a member of GROUP in the “Sender” field |
| ~E | expired messages |
| ~F | flagged messages |
| ~f EXPR | messages originating from EXPR |
| %f GROUP | messages originating from any member of GROUP |
| ~g | cryptographically signed messages |
| ~G | cryptographically encrypted messages |
| ~h EXPR | messages which contain EXPR in the message header |
| ~H EXPR | messages with a spam attribute matching EXPR |
| ~i EXPR | messages which match EXPR in the “Message-ID” field |
| ~k | messages which contain PGP key material |
| ~L EXPR | messages either originated or received by EXPR |
| %L GROUP | message either originated or received by any member of GROUP |
| ~l | messages addressed to a known mailing list |
| ~m [MIN]-[MAX] | messages in the range MIN to MAX *) |
| ~n [MIN]-[MAX] | messages with a score in the range MIN to MAX *) |
| ~N | new messages |
| ~O | old messages |
| ~p | messages addressed to you (consults alternates) |
| ~P | messages from you (consults alternates) |
| ~Q | messages which have been replied to |
| ~r [MIN]-[MAX] | messages with “date-received” in a Date range |
| ~R | read messages |
| ~s EXPR | messages having EXPR in the “Subject” field. |
| ~S | superseded messages |
| ~t EXPR | messages addressed to EXPR |
| ~T | tagged messages |
| ~u | messages addressed to a subscribed mailing list |
| ~U | unread messages |
| ~v | messages part of a collapsed thread. |
| ~V | cryptographically verified messages |
| ~x EXPR | messages which contain EXPR in the “References” or “In-Reply-To” field |
| ~X [MIN]-[MAX] | messages with MIN to MAX attachments *) |
| ~y EXPR | messages which contain EXPR in the “X-Label” field |
| ~z [MIN]-[MAX] | messages with a size in the range MIN to MAX *) **) |
| ~= | duplicated messages (see $duplicate_threads) |
| ~$ | unreferenced messages (requires threaded view) |
| ~(PATTERN) | messages in threads containing messages matching PATTERN, e.g. all threads containing messages from you: ~(~P) |
Where EXPR is a regular expression, and GROUP is an address group.
*) The forms “<[MAX]”, “>[MIN]”, “[MIN]-” and “-[MAX]” are allowed, too.
**) The suffixes “K” and “M” are allowed to specify kilobyte and megabyte respectively.
Special attention has to be payed when using regular expressions inside
of patterns. Specifically, Mutt's parser for these patterns will strip
one level of backslash (“\”), which is normally used for
quoting. If it is your intention to use a backslash in the regular
expression, you will need to use two backslashes instead
(“\\”). You can force Mutt to treat
EXPR as a simple string instead of a regular
expression by using = instead of ~ in the pattern name. For example,
=b *.* will find all messages that contain the
literal string “*.*”. Simple string matches are less
powerful than regular expressions but can be considerably faster. This
is especially true for IMAP folders, because string matches can be
performed on the server instead of by fetching every message. IMAP
treats =h specially: it must be of the form
“header: substring” and will not partially match header
names. The substring part may be omitted if you simply wish to find
messages containing a particular header without regard to its value.
Patterns matching lists of addresses (notably c, C, p, P and t) match if there is at least one match in the whole list. If you want to make sure that all elements of that list match, you need to prefix your pattern with “^”. This example matches all mails which only has recipients from Germany.
Mutt supports two versions of so called “simple searches”. These are issued if the query entered for searching, limiting and similar operations does not seem to contain a valid pattern modifier (i.e. it does not contain one of these characters: “~”, “=” or “%”). If the query is supposed to contain one of these special characters, they must be escaped by prepending a backslash (“\”).
The first type is by checking whether the query string equals
a keyword case-insensitively from Table 4.5, “Simple search keywords”:
If that is the case, Mutt will use the shown pattern modifier instead.
If a keyword would conflict with your search keyword, you need to turn
it into a regular expression to avoid matching the keyword table. For
example, if you want to find all messages matching “flag”
(using $simple_search)
but don't want to match flagged messages, simply search for
“[f]lag”.
Table 4.5. Simple search keywords
| Keyword | Pattern modifier |
|---|---|
| all | ~A |
| . | ~A |
| ^ | ~A |
| del | ~D |
| flag | ~F |
| new | ~N |
| old | ~O |
| repl | ~Q |
| read | ~R |
| tag | ~T |
| unread | ~U |
The second type of simple search is to build a complex search pattern using $simple_search as a template. Mutt will insert your query properly quoted and search for the composed complex query.
Logical AND is performed by specifying more than one criterion. For example:
~t mutt ~f elkins
would select messages which contain the word “mutt” in the list of recipients and that have the word “elkins” in the “From” header field.
Mutt also recognizes the following operators to create more complex search patterns:
! — logical NOT operator
| — logical OR operator
() — logical grouping operator
Here is an example illustrating a complex search pattern. This pattern will select all messages which do not contain “mutt” in the “To” or “Cc” field and which are from “elkins”.
Here is an example using white space in the regular expression (note the “'” and “"” delimiters). For this to match, the mail's subject must match the “^Junk +From +Me$” and it must be from either “Jim +Somebody” or “Ed +SomeoneElse”:
'~s "^Junk +From +Me$" ~f ("Jim +Somebody"|"Ed +SomeoneElse")'
If a regular expression contains parenthesis, or a vertical bar ("|"),
you must enclose the expression in double or single
quotes since those characters are also used to separate different parts
of Mutt's pattern language. For example: ~f
"me@(mutt\.org|cs\.hmc\.edu)" Without the quotes, the
parenthesis wouldn't end. This would be separated to two OR'd patterns:
~f me@(mutt\.org and
cs\.hmc\.edu). They are never what you want.
Mutt supports two types of dates, absolute and relative.
Dates must be in DD/MM/YY format (month and year are optional, defaulting to the current month and year). An example of a valid range of dates is:
Limit to messages matching: ~d 20/1/95-31/10
If you omit the minimum (first) date, and just specify “-DD/MM/YY”, all messages before the given date will be selected. If you omit the maximum (second) date, and specify “DD/MM/YY-”, all messages after the given date will be selected. If you specify a single date with no dash (“-”), only messages sent on the given date will be selected.
You can add error margins to absolute dates. An error margin is a sign (+ or -), followed by a digit, followed by one of the units in Table 4.6, “Date units”. As a special case, you can replace the sign by a “*” character, which is equivalent to giving identical plus and minus error margins.
Example: To select any messages two weeks around January 15, 2001, you'd use the following pattern:
Limit to messages matching: ~d 15/1/2001*2w
This type of date is relative to the current date, and may be specified as:
>offset for messages older than offset units
<offset for messages newer than offset units
=offset for messages exactly offset units old
offset is specified as a positive number with one of the units from Table 4.6, “Date units”.
Example: to select messages less than 1 month old, you would use
Limit to messages matching: ~d <1m
All dates used when searching are relative to the
local time zone, so unless you change the setting
of your $index_format to include a
%[...] format, these are not the
dates shown in the main index.
Sometimes it is desirable to perform an operation on a group of messages
all at once rather than one at a time. An example might be to save
messages to a mailing list to a separate folder, or to delete all
messages with a given subject. To tag all messages matching a pattern,
use the <tag-pattern> function, which is bound
to “shift-T” by default. Or you can select individual
messages by hand using the <tag-message>
function, which is bound to “t” by default. See patterns for Mutt's pattern matching syntax.
Once you have tagged the desired messages, you can use the “tag-prefix” operator, which is the “;” (semicolon) key by default. When the “tag-prefix” operator is used, the next operation will be applied to all tagged messages if that operation can be used in that manner. If the $auto_tag variable is set, the next operation applies to the tagged messages automatically, without requiring the “tag-prefix”.
In macros or push commands, you can use the
<tag-prefix-cond> operator. If there are no
tagged messages, Mutt will “eat” the rest of the macro to
abort it's execution. Mutt will stop “eating” the macro
when it encounters the <end-cond> operator;
after this operator the rest of the macro will be executed as normal.
A hook is a concept found in many other programs which allows you to execute arbitrary commands before performing some operation. For example, you may wish to tailor your configuration based upon which mailbox you are reading, or to whom you are sending mail. In the Mutt world, a hook consists of a regular expression or pattern along with a configuration option/command. See:
for specific details on each type of hook available.
If a hook changes configuration settings, these changes remain effective until the end of the current Mutt session. As this is generally not desired, a “default” hook needs to be added before all other hooks of that type to restore configuration defaults.
Example 4.3. Specifying a “default” hook
send-hook . 'unmy_hdr From:' send-hook ~C'^b@b\.b$' my_hdr from: c@c.c
In Example 4.3, “Specifying a “default” hook”, by default the value of $from and $realname is not overridden. When sending
messages either To: or Cc: to <b@b.b>, the
From: header is changed to <c@c.c>.
Hooks that act upon messages (message-hook, reply-hook, send-hook, send2-hook, save-hook, fcc-hook) are evaluated in a slightly different manner. For the other types of hooks, a regular expression is sufficient. But in dealing with messages a finer grain of control is needed for matching since for different purposes you want to match different criteria.
Mutt allows the use of the search pattern language for matching messages in hook commands. This works in exactly the same way as it would when limiting or searching the mailbox, except that you are restricted to those operators which match information Mutt extracts from the header of the message (i.e., from, to, cc, date, subject, etc.).
For example, if you wanted to set your return address based upon sending mail to a specific address, you could do something like:
send-hook '~t ^me@cs\.hmc\.edu$' 'my_hdr From: Mutt User <user@host>'
which would execute the given command when sending mail to me@cs.hmc.edu.
However, it is not required that you write the pattern to match using the full searching language. You can still specify a simple regular expression like the other hooks, in which case Mutt will translate your pattern into the full language, using the translation specified by the $default_hook variable. The pattern is translated at the time the hook is declared, so the value of $default_hook that is in effect at that time will be used.
Mutt supports connecting to external directory databases such as LDAP, ph/qi, bbdb, or NIS through a wrapper script which connects to Mutt using a simple interface. Using the $query_command variable, you specify the wrapper command to use. For example:
set query_command = "mutt_ldap_query.pl %s"
The wrapper script should accept the query on the command-line. It should return a one line message, then each matching response on a single line, each line containing a tab separated address then name then some other optional information. On error, or if there are no matching addresses, return a non-zero exit code and a one line error message.
An example multiple response output:
Searching database ... 20 entries ... 3 matching: me@cs.hmc.edu Michael Elkins mutt dude blong@fiction.net Brandon Long mutt and more roessler@does-not-exist.org Thomas Roessler mutt pgp
There are two mechanisms for accessing the query function of Mutt. One
is to do a query from the index menu using the
<query> function (default: Q). This will
prompt for a query, then bring up the query menu which will list the
matching responses. From the query menu, you can select addresses to
create aliases, or to mail. You can tag multiple addresses to mail,
start a new query, or have a new query appended to the current
responses.
The other mechanism for accessing the query function is for address
completion, similar to the alias completion. In any prompt for address
entry, you can use the <complete-query>
function (default: ^T) to run a query based on the current address you
have typed. Like aliases, Mutt will look for what you have typed back
to the last space or comma. If there is a single response for that
query, Mutt will expand the address in place. If there are multiple
responses, Mutt will activate the query menu. At the query menu, you
can select one or more addresses to be added to the prompt.
Mutt supports reading and writing of four different local mailbox formats: mbox, MMDF, MH and Maildir. The mailbox type is auto detected, so there is no need to use a flag for different mailbox types. When creating new mailboxes, Mutt uses the default specified with the $mbox_type variable. A short description of the formats follows.
mbox. This is a widely used mailbox format for UNIX. All messages are stored in a single file. Each message has a line of the form:
From me@cs.hmc.edu Fri, 11 Apr 1997 11:44:56 PST
to denote the start of a new message (this is often referred to as the “From_” line). The mbox format requires mailbox locking, is prone to mailbox corruption with concurrently writing clients or misinterpreted From_ lines. Depending on the environment, new mail detection can be unreliable. Mbox folders are fast to open and easy to archive.
MMDF. This is a variant of the mbox format. Each message is surrounded by lines containing “^A^A^A^A” (four times control-A's). The same problems as for mbox apply (also with finding the right message separator as four control-A's may appear in message bodies).
MH. A radical departure from
mbox and MMDF, a mailbox
consists of a directory and each message is stored in a separate file.
The filename indicates the message number (however, this is may not
correspond to the message number Mutt displays). Deleted messages are
renamed with a comma (“,”) prepended to the filename. Mutt
detects this type of mailbox by looking for either
.mh_sequences or .xmhcache files
(needed to distinguish normal directories from MH mailboxes). MH is more
robust with concurrent clients writing the mailbox, but still may suffer
from lost flags; message corruption is less likely to occur than with
mbox/mmdf. It's usually slower to open compared to mbox/mmdf since many
small files have to be read (Mutt provides Section 7.1, “Header Caching” to greatly speed this process up). Depending
on the environment, MH is not very disk-space efficient.
Maildir. The newest of the mailbox formats, used by the Qmail MTA (a replacement for sendmail). Similar to MH, except that it adds three subdirectories of the mailbox: tmp, new and cur. Filenames for the messages are chosen in such a way they are unique, even when two programs are writing the mailbox over NFS, which means that no file locking is needed and corruption is very unlikely. Maildir maybe slower to open without caching in Mutt, it too is not very disk-space efficient depending on the environment. Since no additional files are used for metadata (which is embedded in the message filenames) and Maildir is locking-free, it's easy to sync across different machines us