|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
ResultExtFormatName name -x format string
SpiderDirectory *path*
StoreDescription [XML <tag>|HTML <meta>|TXT size]
"SwishProgParameters *list of parameters*
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
SwishSearchOperators <and-word> <or-word> <not-word>
TmpDir *path*
TranslateCharacters [*string1 string2*|:ascii7:]
TruncateDocSize *number of characters*
UndefinedMetaTags [error|ignore|INDEX|auto]
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
UseStemming [yes|NO]
UseSoundex [yes|NO]
UseWords [*list of words*|File: path]
WordCharacters *string of characters*
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not
appreciate spidering and may block your IP address. You may wish to contact
the remote site before spidering their web site. More information about
spidering can be found in
Directives for the HTTP Access Method Only below.
For the prog method of access IndexDir specifies the path to the program(s) to execute. The external
program must correctly format the documents being passed back to Swish-e.
Examples of external programs are provided in the prog-bin directory.
See prog for details.
Note: Not all directives work with all methods.
- NoContents *list of file suffixes*
-
Files with these suffixes will not have their contents indexed, but will have their path name (file name)
indexed instead.
If the file's type is HTML or HTML2 (as set by IndexContents or
DefaultContents) then the file will be parsed for a HTML title and that title will be
indexed. Note that you must set the file's type with
IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:
|
|
IndexContents HTML* .htm .html
|
If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.
If the file's type is not HTML, or it is HTML and no title is found, then
the file's path will be indexed.
For example, this will allow searching by image file name.
|
|
NoContents .gif .xbm .au .place|remove|prepend|append|regex]
-
ResultExtFormatName name -x format string
-
SpiderDirectory *path*
-
StoreDescription [XML <tag>|HTML <meta>|TXT size]
-
"SwishProgParameters *list of parameters*
-
SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
SwishSearchOperators <and-word> <or-word> <not-word>
-
TmpDir *path*
-
TranslateCharacters [*string1 string2*|:ascii7:]
-
TruncateDocSize *number of characters*
-
UndefinedMetaTags [error|ignore|INDEX|auto]
-
UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]
-
UseStemming [yes|NO]
-
UseSoundex [yes|NO]
-
UseWords [*list of words*|File: path]
-
WordCharacters *string of characters*
-
XMLClassAttributes *list of XML attribute names*
[ TOC ]
These configuration directives control the general behavior of Swish-e.
- IncludeConfigFile *path to config file*
-
This directive can be used to include configuration directives located in
another file.
|
|
IncludeConfigFile /usr/local/swish/conf/site_config.config
|
- IndexReport [0|1|2|3]
-
This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is
1.
This may be overridden from the command line via the -v switch (see
SWISH-RUN).
- ParserWarnLevel [0|1|2|3]
-
Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.
|
|
0 = no report
1 = fatal errors
2 = errors
3 = warnings
|
The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 1. This is because words may be indexed incorrectly in these cases.
Note that unlike other errors generated by Swish-e, these errors are sent
to stderr.
- IndexFile *path*
-
Index file specifies the location of the generated index file. If not
specified, Swish-e will create the file index.swish-e in the current directory.
|
|
IndexFile /usr/local/swish/site.index
|
- obeyRobotsNoIndex [yes|NO]
-
When enabled, Swish-e will not index any HTML file that contains:
|
|
<meta name="robots" content="noindex">
|
The default is to ignore these meta tags and index the document. This tag
is described at http://www.robotstxt.org/wc/exclusion.html.
Note: This feature is only available with the libxml2 HTML parser.
Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use
the following comments in your documents to prevent indexing:
|
|
<!-- SwishCommand noindex -->
<!-- SwishCommand index -->
|
and/or these may be used also:
|
|
<!-- noindex -->
<!-- index -->
|
For example, these are very helpful to prevent indexing of common headers,
footers, and menus.
NOTE: This following items are currently not available. These items require
Swish-e to parse the configuration file while searching.
- EnableAltSearchSyntax [yes|NO]
-
NOTE: This following item is currently not available.
Enable alternate search syntax. Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.
This means a search query can contain "+" and "-" as
syntax parameter.
Example:
|
|
swish-e -w "+word1 +word2 -word3 word4 word5"
"+" = following word has to be in all found documents
"-" = following word may not be in any document found
" " = following word will be searched in documents
|
- SwishSearchOperators <and-word> <or-word> <not-word>
-
NOTE: This following item is currently not available.
Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT
Example (german):
|
|
SwishSearchOperators UND ODER NICHT
|
- SwishSearchDefaultRule [<AND-WORD>|<or-word>]
-
NOTE: This following item is currently not available.
SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between
words or phrases. The default is AND.
The word you specify must match one of the available SwishSearchOperators.
Example:
|
|
SwishSearchOperators UND ODER NICHT
# Make it act like a web search engine
SwishSearchDefaultRule ODER
|
- ResultExtFormatName name -x format string
-
NOTE: This following item is currently not available.
The output of Swish-e can be defined by specifying a format string with the
-x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.
Examples:
|
|
ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\n"
|
Then when searching you can specify the format string's name
|
|
swish-e ... -x moreinfo ...
|
See the -x switch in SWISH-RUN for more information about output formats.
[ TOC ]
Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in the
Swish-e C library. There are a number of fields available for your own use.
None of these fields are required:
- IndexName *text*
-
- IndexDescription *text*
-
- IndexPointer *text*
-
- IndexAdmin *text*
-
These variables specify information that goes into index files to help
users and administrators. IndexName should be the name of your index, like
a book title. IndexDescription is a short description of the index or a URL
pointing to a more full description. IndexPointer should be a pointer to
the original information, most likely a URL. IndexAdmin should be the name
of the index maintainer and can include name and email information. These
values should not be more than 70 or so characters and should be contained
in quotes. Note that the automatically generated date in index files is in
D/M/Y and 24-hour format.
Examples:
[ TOC ]
These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.
- IndexDir [directories or files|URL|external program]
-
IndexDir defines the source of the documents for Swish-e. Swish-e currently
supports three file access methods: File system, HTTP
(also called spidering), and prog for reading files from an external program.
The -S command line argument is used to select the file access method.
|
|
swish-e -c swish.config -S fs - file system
swish-e -c swish.config -S http - internal http spider
swish-e -c swish.config -S prog - external program of any type
|
For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward
slash as the path separator in MS Windows.
For the http method the IndexDir setting is a list of space-separated URLs.
For the prog method the IndexDir setting is a list of space-separated programs to run (which generate
documents for swish to index).
You may specify more than one IndexDir directive.
Any sub-directories of any listed directory will also be indexed.
Note: While processing directories, Swish-e will ignore any files or directories that begin with a
dot ("."). You may index files or directories that begin with a
dot by specifying their name with IndexDir or -i.
Examples:
|
|
# Index this directory an any subdirectories
IndexDir /usr/local/home/http
|
|
|
# Index the docs directory in current directory
IndexDir ./docs
|
|
|
# Index these files in the current directory
IndexDir ./index.html ./page1.html ./page2.html
# and index this directory, too
IndexDir ../public_html
|
For the HTTP method of access specify the URL's from which you want the spidering to
begin.
Example:
< | | | | | | |