Configuration file format -- Attributes

ht://Dig Copyright © 1995-2004 The ht://Dig Group
Please see the file COPYING for license information.


Alphabetical list of attributes


accents_db
type:
string
used by:
htfuzzy, htsearch
default:
${database_base}.accents.db
block:
Global
version:
all
description:
The database file used for the fuzzy "accents" search algorithm. This database is created by htfuzzy and used by htsearch.
example:
accents_db: ${database_base}.uml.db

accept_language
type:
string list
used by:
htdig
default:
No default
block:
Server
version:
3.2.0b4 or later
description:
This attribute allows you to restrict the set of natural languages that are preferred as a response to an HTTP request performed by the digger. This can be done by putting one or more language tags (as defined by RFC 1766) in the preferred order, separated by spaces. By doing this, when the server performs a content negotiation based on the 'accept-language' given by the HTTP user agent, a different content can be shown depending on the value of this attribute. If set to an empty list, no language will be sent and the server default will be returned.
example:
accept_language: en-us en it

add_anchors_to_excerpt
type:
boolean
used by:
htsearch
default:
true
block:
Global
version:
3.1.0 or later
description:
If set to true, the first occurrence of each matched word in the excerpt will be linked to the closest anchor in the document. This only has effect if the EXCERPT variable is used in the output template and the excerpt is actually going to be displayed.
example:
add_anchors_to_excerpt: no

allow_double_slash
type:
boolean
used by:
htdig
default:
false
block:
Global
version:
3.2.0b4 or later
description:
If set to true, strings of multiple slashes ('/') in URL paths will be left intact, rather than being collapsed. This is necessary for some search engine URLs which use slashes to separate fields rather than to separate directory components. However, it can lead to multiple database entries refering to the same file, and it causes '/foo//../' to be equivalent to '/foo/', rather than to '/'.
example:
allow_double_slash: true

allow_in_form
type:
string list
used by:
htsearch
default:
No default
block:
Global
version:
3.1.0 or later
description:
Allows the specified config file attributes to be specified in search forms as separate fields. This could be used to allow form writers to design their own headers and footers and specify them in the search form. Another example would be to offer a menu of search_algorithms in the form.
  <SELECT NAME="search_algorithm">
  <OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1" SELECTED>fuzzy
  <OPTION VALUE="exact:1">exact
  </SELECT>
The general idea behind this is to make an input parameter out of any configuration attribute that's not already automatically handled by an input parameter. You can even make up your own configuration attribute names, for purposes of passing data from the search form to the results output. You're not restricted to the existing attribute names. The attributes listed in the allow_in_form list will be settable in the search form using input parameters of the same name, and will be propagated to the follow-up search form in the results template using template variables of the same name in upper-case. You can also make select lists out of any of these input parameters, in the follow-up search form, using the build_select_lists configuration attribute.
WARNING: Extreme care are should be taken with this option, as allowing CGI scripts to set file names can open security holes.
example:
allow_in_form: search_algorithm search_results_header

allow_numbers
type:
boolean
used by:
htdig, htsearch
default:
false
block:
Global
version:
all
description:
If set to true, numbers are considered words. This means that searches can be done on strings of digits as well as regular words. All the same rules apply to numbers as to words. This does not cause numbers containing a decimal point or commas to be treated as a single entity. When allow_numbers is false, words are stil allowed to contain digits, but they must also contain at least one alphabetic character or extra word character. To disallow digits in words, add the digits to valid_punctuation.
example:
allow_numbers: true

allow_space_in_url
type:
boolean
used by:
htdig
default:
false
block:
Global
version:
3.2.0b6 or later
description:
If set to true, htdig will handle URLs that contain embedded spaces. Technically, this is a violation of RFC 2396, which says spaces should be stripped out (as htdig does by default). However, many web browsers and HTML code generators violate this standard already, so enabling this attribute allows htdig to handle these non-compliant URLs. Even with this attribute set, htdig still strips out all white space (leading, trailing and embedded), except that space characters embedded within the URL will be encoded as %20.
example:
allow_space_in_url: true

allow_virtual_hosts
type:
boolean
used by:
htdig
default:
true
block:
Global
version:
3.0.8b2 or later
description:
If set to true, htdig will index virtual web sites as expected. If false, all URL host names will be normalized into whatever the DNS server claims the IP address to map to. If this option is set to false, there is no way to index either "soft" or "hard" virtual web sites.
example:
allow_virtual_hosts: false

anchor_target
type:
string
used by:
htsearch
default:
No default
block:
Global
version:
3.1.6 or later
description:
When the first matched word in the excerpt is linked to the closest anchor in the document, this string can be set to specify a target in the link so the resulting page is displayed in the desired frame. This value will only be used if the add_anchors_to_excerpt attribute is set to true, the EXCERPT variable is used in the output template and the excerpt is actually displayed with a link.
example:
anchor_target: body

any_keywords
type:
boolean
used by:
htsearch
default:
false
block:
Global
version:
3.2.0b2 or later
description:
If set to true, the words in the keywords input parameter in the search form will be joined with logical ORs rather than ANDs, so that any of the words provided will do. Note that this has nothing to do with limiting the search to words in META keywords tags. See the search form documentation for details on this.
example:
any_keywords: yes

author_factor
type:
number
used by:
htsearch
default:
1
block:
Global
version:
3.2.0b4 or later
description:
Weighting applied to words in a <meta name="author" ... > tag.
See also heading_factor.
example:
author_factor: 1

authorization
type:
string
used by:
htdig
default:
No default
block:
URL
version:
3.1.4 or later
description:
This tells htdig to send the supplied username:password with each HTTP request. The credentials will be encoded using the "Basic" authentication scheme. There must be a colon (:) between the username and password.
This attribute can also be specified on htdig's command line using the -u option, and will be blotted out so it won't show up in a process listing. If you use it directly in a configuration file, be sure to protect it so it is readable only by you, and do not use that same configuration file for htsearch.
example:
authorization: myusername:mypassword

backlink_factor
type:
number
used by:
htsearch
default:
0.1
block:
Global
version:
3.1.0 or later
description:
This is a weight of "how important" a page is, based on the number of URLs pointing to it. It's actually multiplied by the ratio of the incoming URLs (backlinks) and outgoing URLs (links on the page), to balance out pages with lots of links to pages that link back to them. The ratio gives lower weight to "link farms", which often have many links to them. This factor can be changed without changing the database in any way. However, setting this value to something other than 0 incurs a slowdown on search results.
example:
backlink_factor: 501.1

bad_extensions
type: