Use lucene query within ES-API

Tobsucht · April 3, 2018, 1:00pm

I'm building up complex queries within my application which involves that token streams are mapped to queries using a custom querybuilder sub-classed from "org.apache.lucene.util.Querybuilder".

This produces lucene queries which, as far as i understand, cannot be directly used within the ES-API. I have written a small "query bridge" which maps lucene queries to the corresponding elastic-querybuilders, but is that really necessary? isn't there already a solution?

Tobsucht · April 6, 2018, 10:07am

Ok, I figured out that it might be better to move the query building process completely into ES.

Actually I don't now whether I really need a custom querybuilder or not, so here is my scenario:

I have a custom analyzer for the german language. Splitting up compound words results in the sub-terms having the same position in the tokenstream as the compound word. When using a "standard" querybuilder, this will result in a bool-query that puts every sub-term into a should clause.

This might be problematic as the lucene score will be multiplied with a scoring value that represents "the business logic" for the current document. So what might happen is that a document, that matches maybe only one sub-term of the compound word will be placed at the top of the result set because it has a very high "business logic" scoring.

Can this problem be solved with intelligent sub-queries or do I need to place my own querybuilder into ES?

Tobsucht · April 9, 2018, 2:46pm

I found out that the problem is that I'm using decompounding during search time, which is not a great idea.

My analyzer is introducing informations about which terms are good candidates for fuzzy matching an which not. So I want to use this information in the querybuilder to decide when to use a simple Term- or FuzzyQuery. Is it possible to override the standard querybuilder which is used in MultiMatchQuery? If so, any example how this is done?

jimczi · April 11, 2018, 12:40pm

No it's not possible. It's a private class that applies the logic defined in the match query options and I don't think we should allow customizations at this level. You'll need to define your own query parser (that uses your custom query builder) in a plugin.
You can also look at the custom field mapper option:
https://www.elastic.co/guide/en/elasticsearch/plugins/6.2/mapper.html
If you define your own field type you can override MappedFieldType#termQuery which is called by the query parsers to build a term query on the field. You cannot access the attributes from the token stream in this method so it depends on what you need to determine if a term is a good candidate for fuzzy matching or not (if you need more than the term itself).

Tobsucht · April 12, 2018, 12:03pm

The problem with using MappedFieldType is that I need more information than just the term.

What I'm trying: for every token in a search query I'm doing a dictionary lookup, if its present => don't use fuzzing. As the analyzer is for the german language I'm also trying to decompose the token. If I can fully decompose it => don't use fuzzing. If it is partially decomposed, I will check the sub-terms whether they build the prefix of the token => use the prefix as prefixLength for the fuzzy query.
Additional I don't want numbers to be fuzzed.

So I think the only solution is to build my own query. As I only need to change a few lines compared to MatchQuery: I think it is not possible to just extend MatchQuery and its parser by overriding a few methods?

jimczi · April 19, 2018, 10:35pm

You can have a link to the dictionary in the MappedFieldType and performs the decomposition in MappedFieldType#termQuery and MappedFieldType#fuzzyQuery if you want but I agree that your use case is better suited for a TokenFilter or a query parser.
Regarding the extensibility of the MatchQuery we could provide some extension point but you'd still need to write a plugin so this wouldn't save much. The logic in this query is quite simple so you can use it as a starting point for your query parser and add the option that you need freely.

system · May 17, 2018, 10:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Customize IndexWriter/Reader/etc within ElasticSearch Elasticsearch	5	523	June 22, 2018
PayloadTermQuery in ElasticSearch Elasticsearch	7	746	July 6, 2017
How to execute Lucene query inside Elasticsearch Elasticsearch	2	407	November 1, 2019
Why es has not something like RegexQueryBuilder? Elasticsearch	5	423	July 6, 2017
Custom query Parser in Search Api Elasticsearch	7	1097	December 22, 2016

Use lucene query within ES-API

Related topics