This fixes the POD for Dezi::Config.
The original swish-e.org site is no more, but the source code is available on GitHub.
As a courtesy, the documentation is available to browse here on dezi.org.
Dezi 0.4.0 has been released to the CPAN.
This release explicitly requires Dezi::App instead of SWISH::Prog and removes all references to the older SWISH::Prog classes.
As mentioned a couple of months ago, Dezi has been Moosified. The final port of the older code has been released to CPAN as Dezi::App. The 0.400 version of Search::OpenSearch::Engine::Lucy uses Dezi::App instead of SWISH::Prog::Lucy.
Performance is comparable to the older SWISH::Prog-based code, and should provide a cleaner, more community-conformant base upon which to build.
The configuration directives MetaNames and PropertyNames are used to define fields in Dezi. The names PropertyNames and MetaNames originate with Swish-e (see the Swish-e FAQ for a description.)
Here are some notes on the differences between those two directives and how they are implemented for Dezi with Apache Lucy. The relevant Perl classes are Dezi::Lucy::Indexer (or SWISH::Prog::Lucy::Indexer, depending on your version of Dezi).
- A field defined as either a MetaName, PropertyName or both, can be searched.
- Fields are matched against tag names in your XML/HTML documents. See also the TagAlias, UndefinedMetaTags, UndefinedXMLAttributes,
and XMLClassAttributes directives. - You can alias field names with MetaNamesAlias and PropertyNamesAlias.
- MetaNames are tokenized and case-insensitive and (optionally, with FuzzyIndexingMode) stemmed.
- PropertyNames are stored, case-sensitive strings.
- If a field is defined as both a MetaName and PropertyName, then it will be tokenized.
- If a field is defined only as a MetaName, it will be parsed but not stored. That means you can search on the field but when you try and retrieve the field’s value from the results, it will cause a fatal error.
- If a field is defined only as a PropertyName, it will be parsed and stored, but it will not be tokenized. That means the field’s contents are stored without being split up into words.
- You can control the parsing and storage of PropertyName-only fields with the following additional directives:
- PropertyNamesCompareCase – case sensitive search
- PropertyNamesIgnoreCase – case insensitive search (default)
- PropertyNamesNoStripChars – preserve whitespace
- There are two default MetaNames defined: swishdefault and swishtitle.
- There are two default PropertyNames defined: swishtitle and swishdescription.
- The libswish3 XML and HTML parsers will automatically treat a <title> tag as swishtitle. Likewise they will treat <body> tag as swishdescription.
- Things get complicated quickly when defining fields. Experiment with small test cases to arrive at the configuration that works best with your application.
A post at blogs.perl.org on the new version of Dezi currently underway.
The dezi-client for Ruby version 1.1.0 has been pushed to https://rubygems.org/gems/dezi-client.
This new version uses the Faraday+Excon HTTP client and fixes a bug with multiple values for GET params.
Dezi 0.2.12 has been released to CPAN: https://metacpan.org/release/Dezi
The Changelog:
0.002012 27 Feb 2014 - fix Config docs re: Dezi::Admin, add auto_commit note - add --auto_commit (--ac) option to toggle auto_commit via cli
Dezi is built on top of SWISH::3 and SWISH::Prog, which reserve several built-in field names.
Here’s a list of the reserved, built-in field names:
- swishdefault
- swishdescription
- swishdocpath
- swishdocsize
- swishencoding
- swishlastmodified
- swishmime
- swishparser
- swishtitle
- swishwordnum
The following method names are reserved in SWISH::Prog::Result which are mapped to the built-in field names:
built-in | method |
---|---|
swishdocpath | uri |
swishlastmodified | mtime |
swishtitle | title |
swishdescription | summary |
Dezi results come from Search::OpenSearch which uses the method names from SWISH::Prog::Result as the default attribute names.
In addition, the SWISH::3 parser aliases some common tag names to built-in fields, in order for HTML documents to get parsed in a more intuitive way. Those tags are title
which is mapped to swishtitle
and body
which is mapped to swishdescription
.
Avoid the use of any of these built-in field or method names when you are defining fields in your Dezi configuration. Behavior is unpredictable if there are any namespace collisions.
New in Dezi 0.2.10 is the --elastic
feature. This option is shorthand for:
engine_config => { indexer_config => { config => { UndefinedMetaTags => 'autoall', } } }
But instead of all that, just pass the --elastic
option when you start the server:
% dezi --elastic
The --elastic
feature makes your Dezi server act like Elasticsearch: fields are created simply by adding a document that contains them. So when you do this:
$ curl -XPOST 'http://localhost:5000/index/blog/post/1' -d ' { "user": "dilbert", "postDate": "2011-12-15", "body": "Search is hard. Search should be easy." , "title": "On search" }' -H 'Content-Type: application/json'
the fields for user
and postdate
spring into existence.
Note that all field names are lowercased, so postDate
becomes postdate
.
Note also that there are some reserved field names, so title
is not added as a field because the title
field is already aliased to swishtitle
. Likewise, body
is aliased to swishdescription
.
The elastic feature requires the following supporting modules:
- SWISH::3 1.000006
- SWISH::Prog::Lucy 0.17