MetaNames and PropertyNames

The configuration directives MetaNames and PropertyNames are used to define fields in Dezi. The names PropertyNames and MetaNames originate with Swish-e (see the Swish-e FAQ for a description.)

Here are some notes on the differences between those two directives and how they are implemented for Dezi with Apache Lucy. The relevant Perl classes are Dezi::Lucy::Indexer (or SWISH::Prog::Lucy::Indexer, depending on your version of Dezi).

  • A field defined as either a MetaName, PropertyName or both, can be searched.
  • Fields are matched against tag names in your XML/HTML documents. See also the TagAlias, UndefinedMetaTags, UndefinedXMLAttributes,
    and XMLClassAttributes directives.
  • You can alias field names with MetaNamesAlias and PropertyNamesAlias.
  • MetaNames are tokenized and case-insensitive and (optionally, with FuzzyIndexingMode) stemmed.
  • PropertyNames are stored, case-sensitive strings.
  • If a field is defined as both a MetaName and PropertyName, then it will be tokenized.
  • If a field is defined only as a MetaName, it will be parsed but not stored. That means you can search on the field but when you try and retrieve the field’s value from the results, it will cause a fatal error.
  • If a field is defined only as a PropertyName, it will be parsed and stored, but it will not be tokenized. That means the field’s contents are stored without being split up into words.
  • You can control the parsing and storage of PropertyName-only fields with the following additional directives:
    • PropertyNamesCompareCase – case sensitive search
    • PropertyNamesIgnoreCase – case insensitive search (default)
    • PropertyNamesNoStripChars – preserve whitespace
  • There are two default MetaNames defined: swishdefault and swishtitle.
  • There are two default PropertyNames defined: swishtitle and swishdescription.
  • The libswish3 XML and HTML parsers will automatically treat a <title> tag as swishtitle. Likewise they will treat <body> tag as swishdescription.
  • Things get complicated quickly when defining fields. Experiment with small test cases to arrive at the configuration that works best with your application.

Ruby dezi-client 1.1.0 released

The dezi-client for Ruby version 1.1.0 has been pushed to https://rubygems.org/gems/dezi-client.

This new version uses the Faraday+Excon HTTP client and fixes a bug with multiple values for GET params.

Reserved fields

Dezi is built on top of SWISH::3 and SWISH::Prog, which reserve several built-in field names.

Here’s a list of the reserved, built-in field names:

  • swishdefault
  • swishdescription
  • swishdocpath
  • swishdocsize
  • swishencoding
  • swishlastmodified
  • swishmime
  • swishparser
  • swishtitle
  • swishwordnum

The following method names are reserved in SWISH::Prog::Result which are mapped to the built-in field names:

built-in method
swishdocpath uri
swishlastmodified mtime
swishtitle title
swishdescription summary

Dezi results come from Search::OpenSearch which uses the method names from SWISH::Prog::Result as the default attribute names.

In addition, the SWISH::3 parser aliases some common tag names to built-in fields, in order for HTML documents to get parsed in a more intuitive way. Those tags are title which is mapped to swishtitle and body which is mapped to swishdescription.

Avoid the use of any of these built-in field or method names when you are defining fields in your Dezi configuration. Behavior is unpredictable if there are any namespace collisions.

The –elastic feature

New in Dezi 0.2.10 is the --elastic feature. This option is shorthand for:

 engine_config => {
   indexer_config => {
     config => {
       UndefinedMetaTags => 'autoall',
     }
   }
 }

But instead of all that, just pass the --elastic option when you start the server:

% dezi --elastic

The --elastic feature makes your Dezi server act like Elasticsearch: fields are created simply by adding a document that contains them. So when you do this:

$ curl -XPOST 'http://localhost:5000/index/blog/post/1' -d '
{ 
    "user": "dilbert", 
    "postDate": "2011-12-15", 
    "body": "Search is hard. Search should be easy." ,
    "title": "On search"
}' -H 'Content-Type: application/json'

the fields for user and postdate spring into existence.

Note that all field names are lowercased, so postDate becomes postdate.

Note also that there are some reserved field names, so title is not added as a field because the title field is already aliased to swishtitle. Likewise, body is aliased to swishdescription.

The elastic feature requires the following supporting modules:

  • SWISH::3 1.000006
  • SWISH::Prog::Lucy 0.17