Note: the \n may need to be protected from your shell.
See also ResultExtFormatName for a way to define named format strings in the swish configuration file.
Format of "formatstring":
"text<propertyname>text<propertyname fmt=propfmtstr>text..."
Where propertyname is:
the name of a user property as specified with the config file directive "PropertyNames"
the name of a swish Auto property (see below). These properties are defined automatically by swish -- you do not need to specify them with PropertyNames directive. (This may change in the future.)
propertynames must be placed within "<" and ">".
User properties:
Swish-e allows you to specify certain META tags within your documents that can be used as document properties. The contents of any META tag that has been identified as a document property can be returned as part of the search results. Doucment properties must be defined while indexing using the PropertyNames configuration directive (see SWISH-CONFIG).
Examples of user-defined PropertyNames:
<keywords>
<author>
<deliveredby>
<reference>
<id>
Auto properties:
Swish defines a number of "Auto" properties for each document indexed.
These are available for output when using the -x format.
Name Type Contents
-------------- ------- ----------------------------------------------
swishreccount Integer Result record counter
swishtitle String Document title
swishrank Integer Result rank for this hit
swishdocpath String URL or filepath to document
swishdocsize Integer Document size in bytes
swishlastmodified Date Last modified date of document
swishdescription String Description of document (see:StoreDescription)
swishdbfile String Path of swish database indexfile
The Auto properties can also be specified using shortcuts:
Shortcut Property Name
-------- --------------
%c swishreccount
%d swishdescription
%D swishlastmodified
%I swishdbfile
%p swishdocpath
%r swishrank
%l swishdocsize
%t swishtitle
For example, these are equivalent:
-x '<swishrank>:<swishdocpath>:<swishtitle>\n' -x '%r:%p:%t\n'
Use a double percent sign "%%" to enter a literal percent sign in the output.
Formatstrings of properties:
Properties listed in an -x format string can include format control strings.
These "propertyformats" are used to control how the contents of the associated property are printed.
Property formats are used like C-language printf formats.
The property format is specified by including the attribute "fmt" within the property tag.
Format strings cannot be used with the "%" shortcuts described above.
General syntax:
-x '<propertyname fmt="propfmtstr">'
where subfmt controls the output format of propertyname.
Examples of property format strings:
date type: <swishlastmodified fmt="%d.%m.%Y">
string type: <swishtitle fmt="%-40.35s">
integer type: <swishreccount fmt=/%8.8d/>
Please see the manual pages for strftime(3) and sprintf(3) for an explanation of format strings. Note: some versions of strftime do not offer the %s format string (number of seconds since the Epoch), so swish provides a special format string "%ld" to display the number of seconds since the Epoch.
The first character of a property format string defines the delimiter for the format string. For example,
-x "<author fmt=[%20s]> ...\n"
-x "<author fmt='%20s'> ...\n"
-x "<author fmt=/%20s/> ...\n"
Standard predefined formats:
If you ommit the sub-format, the following formats are used:
String type: "%s" (like printf char *)
Integer type: "%d" (like printf int)
Float type: "%f" (like printf double)
Date type: "%Y-%m-%d %H:%M:%S" (like strftime)
Text in "formatstring" or "propfmtstr":
Text will be output as-is in format strings (and property format strings). Special characters can be escaped with a backslash. To get a new line for each result hit, you have to include the Newline-Character "\n" at the end of "fmtstr".
-x "<swishreccount>|<swishrank>|<swishdocpath>\n"
-x "Count=<swishreccount>, Rank=<swishrank>\n"
-x "Title=\<b\><swishtitle>\</b\>"
-x 'Date: <swishlastmodified fmt="%m/%d/%Y">\n'
-x 'Date in seconds: <swishlastmodified fmt=/%ld/>\n'
Control/Escape charcters:
you can use C-like control escapes in the format string:
known controls: \a, \b, \f, \n, \r, \t, \v, digit escapes: \xhexdigits \0octaldigits character escapes: \anychar
Example,
swish -x "%c\t%r\t%p\t\"<swishtitle fmt=/%40s/>\"\n"
Examples of -x format strings:
-x "%c|%r|%p|%t|%D|%d\n"
-x "%c|%r|%p|%t|<swishdate fmt=/%A, %d. %B %Y/>|%d\n"
-x "<swishrank>\t<swishdocpath>\t<swishtitle>\t<keywords>\n
-x "xml_out: \<title\><swishtitle>\>\</title\>\n"
-x "xml_out: <swishtitle fmt='<title>%s</title>'>\n"
The -H n switch generates extened header output. This is most useful when searching more than one
index file at a time by specifying more than one index file with the -f switch.
-H 2 will generate a set of headers specific to each index file.
This gives access to the settings used to generate each index file.
Even when searching a single index file, -H n will provided additional information about the index file,
how it was indexed, and how swish is interperting the query.
-H 0 : print no header information, output only search result entries.
-H 1 : print standard result header (default).
-H 2 : print additional header information for each searched index file.
-H 3 : enhanced header output (e.g. print stopwords).
-H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>)
This is an experimental feature!
The default ranking scheme in SWISH-E evaluates each word in a query in terms of its frequency and position in each document. The default scheme is 0.
New in version 2.4.3 you may optionally select an experimental ranking scheme that, in addition to document frequency and position, uses Inverse Document Frequency (IDF), or the relative frequency of each word across all the indexes being searched, and Relative Density, or the normalization of the frequency of a word in relationship to the number of words in the document.
NOTE: IgnoreTotalWordCountWhenRanking must be set to no or 0 in your index(es) for -R 1 to work.
Specify -R 1 to turn on IDF ranking. See the API documentation for how to set the ranking scheme in your Perl or C program.
Print the current version.
The -k switch is used for testing and will cause swish to print out all keywords
in the index beginning with that letter. You may enter -k '*' to generate a list of all words indexed
by swish.
The -D option is no longer supported in version 2.2.
The -T option is used to print out information that may be helpful when debugging swish-e's
operation. This option replaced the -D option of previous versions.
Running -T help will print out a list of available *options*
In previous versions of Swish-e indexing would require a very large amount of memory and the indexing process could be very slow. Merging provided a way to index in chunks and then combine the indexes together into a single index.
Indexing is much faster now and uses much less memory, and with the -e switch very little memory is
needed to index a large site.
Still, at times it can be useful to merge different index files into one file for searching. This could be because you want to keep separate site indexes and a common one for a global search, or you have separate collections of documents that you wish to search all at one time, but manage separately.
Merges the indexes specified on the command line -- the last file name entered is the output file. The output index must not exist (otherwise merge will not proceed).
Only indexes that were indexed with common settings may be merged. (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.).
Use the -e switch while merging to reduce memory usage.
Merge generates progress messages regardless of the setting of -v.
Specify a configuration file while indexing to add administrative information to the output index file.