This fixes the POD for Dezi::Config.
The original swish-e.org site is no more, but the source code is available on GitHub.
As a courtesy, the documentation is available to browse here on dezi.org.
Dezi 0.4.0 has been released to the CPAN.
This release explicitly requires Dezi::App instead of SWISH::Prog and removes all references to the older SWISH::Prog classes.
As mentioned a couple of months ago, Dezi has been Moosified. The final port of the older code has been released to CPAN as Dezi::App. The 0.400 version of Search::OpenSearch::Engine::Lucy uses Dezi::App instead of SWISH::Prog::Lucy.
Performance is comparable to the older SWISH::Prog-based code, and should provide a cleaner, more community-conformant base upon which to build.
The configuration directives MetaNames and PropertyNames are used to define fields in Dezi. The names PropertyNames and MetaNames originate with Swish-e (see the Swish-e FAQ for a description.)
Here are some notes on the differences between those two directives and how they are implemented for Dezi with Apache Lucy. The relevant Perl classes are Dezi::Lucy::Indexer (or SWISH::Prog::Lucy::Indexer, depending on your version of Dezi).
- A field defined as either a MetaName, PropertyName or both, can be searched.
- Fields are matched against tag names in your XML/HTML documents. See also the TagAlias, UndefinedMetaTags, UndefinedXMLAttributes,
and XMLClassAttributes directives. - You can alias field names with MetaNamesAlias and PropertyNamesAlias.
- MetaNames are tokenized and case-insensitive and (optionally, with FuzzyIndexingMode) stemmed.
- PropertyNames are stored, case-sensitive strings.
- If a field is defined as both a MetaName and PropertyName, then it will be tokenized.
- If a field is defined only as a MetaName, it will be parsed but not stored. That means you can search on the field but when you try and retrieve the field’s value from the results, it will cause a fatal error.
- If a field is defined only as a PropertyName, it will be parsed and stored, but it will not be tokenized. That means the field’s contents are stored without being split up into words.
- You can control the parsing and storage of PropertyName-only fields with the following additional directives:
- PropertyNamesCompareCase – case sensitive search
- PropertyNamesIgnoreCase – case insensitive search (default)
- PropertyNamesNoStripChars – preserve whitespace
- There are two default MetaNames defined: swishdefault and swishtitle.
- There are two default PropertyNames defined: swishtitle and swishdescription.
- The libswish3 XML and HTML parsers will automatically treat a <title> tag as swishtitle. Likewise they will treat <body> tag as swishdescription.
- Things get complicated quickly when defining fields. Experiment with small test cases to arrive at the configuration that works best with your application.
A post at blogs.perl.org on the new version of Dezi currently underway.
The dezi-client for Ruby version 1.1.0 has been pushed to https://rubygems.org/gems/dezi-client.
This new version uses the Faraday+Excon HTTP client and fixes a bug with multiple values for GET params.
Dezi 0.2.12 has been released to CPAN: https://metacpan.org/release/Dezi
The Changelog:
0.002012 27 Feb 2014 - fix Config docs re: Dezi::Admin, add auto_commit note - add --auto_commit (--ac) option to toggle auto_commit via cli
Dezi is built on top of SWISH::3 and SWISH::Prog, which reserve several built-in field names.
Here’s a list of the reserved, built-in field names:
- swishdefault
- swishdescription
- swishdocpath
- swishdocsize
- swishencoding
- swishlastmodified
- swishmime
- swishparser
- swishtitle
- swishwordnum
The following method names are reserved in SWISH::Prog::Result which are mapped to the built-in field names:
| built-in | method |
|---|---|
| swishdocpath | uri |
| swishlastmodified | mtime |
| swishtitle | title |
| swishdescription | summary |
Dezi results come from Search::OpenSearch which uses the method names from SWISH::Prog::Result as the default attribute names.
In addition, the SWISH::3 parser aliases some common tag names to built-in fields, in order for HTML documents to get parsed in a more intuitive way. Those tags are title which is mapped to swishtitle and body which is mapped to swishdescription.
Avoid the use of any of these built-in field or method names when you are defining fields in your Dezi configuration. Behavior is unpredictable if there are any namespace collisions.
New in Dezi 0.2.10 is the --elastic feature. This option is shorthand for:
engine_config => {
indexer_config => {
config => {
UndefinedMetaTags => 'autoall',
}
}
}
But instead of all that, just pass the --elastic option when you start the server:
% dezi --elastic
The --elastic feature makes your Dezi server act like Elasticsearch: fields are created simply by adding a document that contains them. So when you do this:
$ curl -XPOST 'http://localhost:5000/index/blog/post/1' -d '
{
"user": "dilbert",
"postDate": "2011-12-15",
"body": "Search is hard. Search should be easy." ,
"title": "On search"
}' -H 'Content-Type: application/json'
the fields for user and postdate spring into existence.
Note that all field names are lowercased, so postDate becomes postdate.
Note also that there are some reserved field names, so title is not added as a field because the title field is already aliased to swishtitle. Likewise, body is aliased to swishdescription.
The elastic feature requires the following supporting modules:
- SWISH::3 1.000006
- SWISH::Prog::Lucy 0.17