beginsWith / endsWith / contains


(dawi) #1

Hi,

I have to implement a search backend for our product to replace the
old sql queries.

The index is build using the following default analyzer settings:
index.analysis.analyzer.default.filter = lowercase
index.analysis.analyzer.default.tokenizer = keyword

Now I have to provide the following search methods:

  • exact
  • matchesPattern
  • beginsWith
  • endsWith
  • contains
  1. exact is implemented as term query.
  • "aaa" matches exactly "aaa"
  • "aa" matches "aa" but not "aaa"
  1. matchesPattern is implemented as wildcard query
  • "a" matches "a"
  • "a*" matches "a", "aa", "ab", "a*"
    Question here: how can I search for wildcards in fields?
  • something like "a*" that would match "a", "aa" and "ab" but not
    "a" and "aa"
  1. beginsWith is implemented as prefix query.
  • "aaa" matches "aaa", "aaaa", "aaa*"
  • "aa" matches "aa", "a*aa" but not "aaa"
  1. endsWith
    Should behave as beginsWith.
    But what is the best way to implement this feature?
    Is it possible to define multiple analyzers per field and to use the
    reverse filter for this?
    index.analysis.analyzer.reverse.filter=reverse
    index.analysis.analyzer.reverse.tokenizer=keyword
    Or do I have to store a reversed version of the field in the index and
    use the prefix query?

  2. contains

  • "a" matches "aa", "aaaa"
  • "a*" matches "a*" or "aa*aa" but not "aaaa"
    Could be implemented using wildcard query if character escaping is
    possible there.

Hopefully anyone has some tips to point me in the right directions.

Regards,
Daniel


(Karussell) #2

It is possible to use multiple types (and so multiple analyzers) for
one field:
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

Also I would suggest to use edge ngram tokenizer/filter to improve
performance of wildcard searches:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer.html

there is also an option for front or back.

Peter.

On 2 Nov., 10:43, dawi d.wilmer.1...@googlemail.com wrote:

Hi,

I have to implement a search backend for our product to replace the
old sql queries.

The index is build using the following default analyzer settings:
index.analysis.analyzer.default.filter = lowercase
index.analysis.analyzer.default.tokenizer = keyword

Now I have to provide the following search methods:

  • exact
  • matchesPattern
  • beginsWith
  • endsWith
  • contains
  1. exact is implemented as term query.
  • "aaa" matches exactly "aaa"
  • "aa" matches "aa" but not "aaa"
  1. matchesPattern is implemented as wildcard query
  • "a" matches "a"
  • "a*" matches "a", "aa", "ab", "a*"
    Question here: how can I search for wildcards in fields?
  • something like "a*" that would match "a", "aa" and "ab" but not
    "a" and "aa"
  1. beginsWith is implemented as prefix query.
  • "aaa" matches "aaa", "aaaa", "aaa*"
  • "aa" matches "aa", "a*aa" but not "aaa"
  1. endsWith
    Should behave as beginsWith.
    But what is the best way to implement this feature?
    Is it possible to define multiple analyzers per field and to use the
    reverse filter for this?
    index.analysis.analyzer.reverse.filter=reverse
    index.analysis.analyzer.reverse.tokenizer=keyword
    Or do I have to store a reversed version of the field in the index and
    use the prefix query?

  2. contains

  • "a" matches "aa", "aaaa"
  • "a*" matches "a*" or "aa*aa" but not "aaaa"
    Could be implemented using wildcard query if character escaping is
    possible there.

Hopefully anyone has some tips to point me in the right directions.

Regards,
Daniel


(dawi) #3

Hi Karussell,

thanks for the hint of using multi-field-type.

But we are using elasticsearch schema free, so is it possible to define
different analyzers more global than on a specific field? I am not sure if
this is possible, at least I could not finde one example that does so in
the documentation. There all mappings are done on field level.

Concerning the Edge NGram filter: I will try it and see how this influences
performance and index size.

Concerning wildcard queries: Is there now possibility to escape wildcard
characters (e.g. search for "aa*" finds "aa*")?

Regards,
Daniel


(system) #4