I have to implement a search backend for our product to replace the
old sql queries.
The index is build using the following default analyzer settings:
index.analysis.analyzer.default.filter = lowercase
index.analysis.analyzer.default.tokenizer = keyword
Now I have to provide the following search methods:
exact
matchesPattern
beginsWith
endsWith
contains
exact is implemented as term query.
"aaa" matches exactly "aaa"
"aa" matches "aa" but not "aaa"
matchesPattern is implemented as wildcard query
"a" matches "a"
"a*" matches "a", "aa", "ab", "a*"
Question here: how can I search for wildcards in fields?
something like "a*" that would match "a", "aa" and "ab" but not
"a" and "aa"
beginsWith is implemented as prefix query.
"aaa" matches "aaa", "aaaa", "aaa*"
"aa" matches "aa", "a*aa" but not "aaa"
endsWith
Should behave as beginsWith.
But what is the best way to implement this feature?
Is it possible to define multiple analyzers per field and to use the
reverse filter for this?
index.analysis.analyzer.reverse.filter=reverse
index.analysis.analyzer.reverse.tokenizer=keyword
Or do I have to store a reversed version of the field in the index and
use the prefix query?
contains
"a" matches "aa", "aaaa"
"a*" matches "a*" or "aa*aa" but not "aaaa"
Could be implemented using wildcard query if character escaping is
possible there.
Hopefully anyone has some tips to point me in the right directions.
I have to implement a search backend for our product to replace the
old sql queries.
The index is build using the following default analyzer settings:
index.analysis.analyzer.default.filter = lowercase
index.analysis.analyzer.default.tokenizer = keyword
Now I have to provide the following search methods:
exact
matchesPattern
beginsWith
endsWith
contains
exact is implemented as term query.
"aaa" matches exactly "aaa"
"aa" matches "aa" but not "aaa"
matchesPattern is implemented as wildcard query
"a" matches "a"
"a*" matches "a", "aa", "ab", "a*"
Question here: how can I search for wildcards in fields?
something like "a*" that would match "a", "aa" and "ab" but not
"a" and "aa"
beginsWith is implemented as prefix query.
"aaa" matches "aaa", "aaaa", "aaa*"
"aa" matches "aa", "a*aa" but not "aaa"
endsWith
Should behave as beginsWith.
But what is the best way to implement this feature?
Is it possible to define multiple analyzers per field and to use the
reverse filter for this?
index.analysis.analyzer.reverse.filter=reverse
index.analysis.analyzer.reverse.tokenizer=keyword
Or do I have to store a reversed version of the field in the index and
use the prefix query?
contains
"a" matches "aa", "aaaa"
"a*" matches "a*" or "aa*aa" but not "aaaa"
Could be implemented using wildcard query if character escaping is
possible there.
Hopefully anyone has some tips to point me in the right directions.
But we are using elasticsearch schema free, so is it possible to define
different analyzers more global than on a specific field? I am not sure if
this is possible, at least I could not finde one example that does so in
the documentation. There all mappings are done on field level.
Concerning the Edge NGram filter: I will try it and see how this influences
performance and index size.
Concerning wildcard queries: Is there now possibility to escape wildcard
characters (e.g. search for "aa*" finds "aa*")?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.