Skip to main content.
home | support | download

CHANGES - List of revisions

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This document contains list of bug fixes and feature additions to Swish-e.

Version 2.4.7 - 4 April 2009

  • Added ReturnRawRank for raw rank score

    Setting ReturnRawRank to a true value will return the rank score unscaled. Can be set with the -a command line option (mnemonic: "a"bsolute rank score).

  • Yanked setenv feature introduced in 2.4.6

    The ranking debugging feature using setenv introduced in 2.4.6 was yanked. Some platforms (notably HP-UX and Windows) lack the setenv feature, and the convenience of setting the env var was not worth the limitations.

Version 2.4.6 - 10 March 2008

  • MinWordLength respected in query parser

    Clark Vent reported that the query parser was not respecting MinWordLength settings. See http://dev.swish-e.org/changeset/2145

  • Patch to file.c.

    The file.c patch was in response to http://swish-e.org/archive/2007-03/11321.html although that user never responded about that patch.

  • SWISH_DEBUG_RANK env var now enables rank debugging

    Set SWISH_DEBUG_RANK to a true value to enable lots of rank debugging on stderr.

  • Perl Makefile.PL patched to fix MakeMaker issue

    Recent versions of ExtUtils::MakeMaker revealed a bug in Makefile.PL. Patch from mschwern via RT, report by mpeters.

  • LARGEFILE support detected automatically in configure

    jrobinson852@yahoo.com suggest LARGEFILE support be auto-detected since it is needed so often on Linux systems.

  • New Snowball stemmers

    Trygve Falch contributed patches to update the Snowball stemmers, including new Hungarian and Romanian stemmers.

  • Patched leaks

    Anthony Dovgal patched two leaks. One when there's a failure to open a file the file name was not freed.

    SwishSetSearchLimit() was nulling the search limits when an error was found in the parameters, but not freeing the existing limits.

  • Leak in SwishResetSearchLimit

    Fixed a leak if a limit was set and then reset but not prepared. Patch provided by Antony Dovgal.

  • New API functions added

    Added SwishGetStructure() and SwishGetPhraseDelimiter() functions which return relevant properties of the search object. Patch provided by Antony Dovgal.

Version 2.4.5 - 22 Jan 2007

  • Fixed 'deflate' handling in spider.pl

    spider.pl was using the wrong method do uncompress HTTP responses that were 'deflate' encoded. Also decode content based on the document's charset and encode back to charset before outputting.

  • re-indexing required

    The magic numbers in src/swish.h were changed to require re-indexing from version 2.4.4 indexes. This should have been done in 2.4.4 as well, and anytime the index format changes. -- karman

  • fixed stemmer bug introduced in 2.4.4

    stemmer.c had a mix up in the deprecated stemmer assignments for "Stemmer_en" and "Stem". Also fixed stemmer.h so that 2.4.3 indexes can be read correctly. -- karman

  • Now fork/exec to run filters

    FileFilter* was using popen to run the filter, which could pass user data though the shell. Now uses fork/exec if fork is available which should be everywhere except Windows. In windows popen is used but all parameters are double-quoted. -- moseley

  • fixed signed/unsigned warnings from gcc 4.x

    Cleaned up search.c to catch mismatched signedness warnings from newer GCC versions. This issue pre-existed 2.4.4 but the new wildcard features in search.c made for a lot more warnings. -- karman

  • Makefile.mingw included in distrib

    Modified root Makefile to include the perl/Makefile.mingw file. -- karman

Version 2.4.4 - 11 Oct 2006

  • Version 2.4.4 RC1

    Release Candidate 1 for 2.4.4, 2 Oct 2006.

  • quote fix for FileFilter config param

    Ludovic Drolez contributed a patch to fix a quoting issue with filenames. This affects non-Windows builds only.

  • SWISH::Filter now on CPAN

    SWISH::Filter is now available on http://cpan.org/. The version in the distribution is not kept in sync with the CPAN version. Install the CPAN version if you want the latest and greatest version.

  • SWISH::API updated to 0.04

    Added several fixes, including:

    • Perlish method names from mpeters@plusthree.com
    • switched to XSLoader with DynaLoader as fallback
    • added VERSION method to satisfy some versions of MakeMaker
    • Fuzzify() method now actually works as advertised
  • added proximity feature and single character wildcard with '?' instead of '*'

    Herman Knoops contributed these patches. See http://swish-e.org/archive/2006-05/10543.html

    Error messages were also changed to better reflect correct use of wildcards.

  • fixed bug when using DoubleMetaphone

    Fixed problem reported by Andreas Völter where a query that generated a two-word query with DoubleMetaphone fuzzy mode was not working.

  • fix sparc64 property issue

    Sorithy Seng (pourlassi@gmail.com) submitted a patch against docprop.c to fix an issue on sparc64 platforms. It is unknown whether this bug affected other 64-bit architectures.

  • fixed bug when StopWords resulted in no unique words

    Added check in db_native.c to check that some words exist before writing index.

  • updates to SWISH-RUN.1

    Added doc for -u and -r options.

  • filename only in SWISH::Filters

    added fix to SWISH::Filters::pp2html and SWISH::Filters::XLtoHTML to save only filename as title without full path

  • Removed Stem and Stemmer_en

    The legacy Porter stemmer was removed. This had been deprecated some time ago. A warning will issue if the old stemmer is indicated in config file, and Stemmer_en1 will be used instead.

  • GPL'd all the source files with the new Swish-e License

    After a source code review, the developers decided to put Swish-e under the GPL with a special exception for linking against libswish-e. See http://swish-e.org/license.html for the details.

  • Fixed Segfault with updating incremental index

    Dobrica Pavlinusic reported a segfaut after updating an index multiple times. José provided updated worddata.c. - April 27, 2005

  • Fixed NOT check with incremental indexes

    Swish was returning results for deleted files when the NOT operator was used.

  • Fixed bug when using old parsers with zero length input

    Thomas Angst reported swish consuming memory when using -S prog to process large number of empty documents.

    When -S prog generated a zero length file the old parsers (e.g. TXT) would attempt to read in *all* content from the -S prog program into a buffer. The old parser incorrectly assumed it was reading from a filter and tried to read to eof().

  • Changes to ParserWarnLevel

    The default value for ParserWarnLevel was changed form zero to two.

    The ParserWarnLevel controls the error handling of the libxml2 parser. The higher the setting, the more verbose the output. The change to the default is to report when libxml2 has problems parsing a document (which often times results in processing only part of a document).

    To get the old behavior, either set ParserWarnLevel to zero in your config file, or use the new -W command line option to set the ParserWarnLevel at run time. If ParserWarnLevel is set in the config file, it will override the -W option.

    Also, to see UTF-8 to 8859-1 conversion errors set ParserWarnLevel to 3 or more. Previously, these warning were issues at ParserWarnLevel of one.

  • Documentation changes

    Removed all the target documentation (html, pdf, ps) from cvs. There's now a separate cvs module "swish_website" that is used to generate both the website and the html docs. If building swish-e from cvs please see the README.cvs file for instructions.

  • Fixed bug in pre-sorted indexes with USE_BTREE

    Gunnar Mätzler reported a problem with reading the pre-sorted property index tables when running with USE_BTREE (--enable-enremental). Not all entries were being written to disk. There was/is a question if the "array" code used for pre-sorted indexes with USE_BTREE would be slower. So, added a separate define USE_PRESORT_ARRAY to enable that code when USE_BTREE is set. This allows using the old integer arrays with USE_BTREE. Gunnar reported that this is working, but more testing is needed. Need to compare speed of the array code vs. the non-array code, and to verify the workings of USE_PRESORT_ARRAY code.

  • Add strcoll() usage for sorting properties

    Andreas Seltenreich provided a patch to use strcoll when sorting properties. strcoll is locale dependent.

  • Fix incremental indexing when adding back a file

    Jose fixed a problem with incremental indexing where a file could not be added back to the index once removed.

    Patch initially provided by Dobrica Pavlinusic:

        http://swish-e.org/Discussion/archive/2004-12/8694.html
  • Documentation correction

    A change in the default way the index is compressed was not documented in 2.4.3. The change resulted in larger indexes. See CompressPositions below and in SWISH-CONFIG.

  • libxml2 UTF-8 conversion failures

    Fixed issue where a UTF-8 to Latin1 encoding failure would skip more input than just the failed character. Libxml2 passes swish text that is not null terminated, but the libxml2 functions to skip UTF-8 chars expected a null-terminated string. Replace libxml2 call with fixed version.

Version 2.4.3 December 9, 2004

  • New config directive: CompressPositions

    This option enables zlib compression for word data in the index. Previously word data was always compressed but resulted in slower wildcard searches. The default now is to not compress the word data, but results in larger index files. Set to "YES" to get pre-2.4.3 index sizes.

    [This CHANGES entry was added after 2.4.3 was released]

  • Improved error messsages when using incremental indexing

    There was a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsersas a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories