Use lucene query within ES-API

I'm building up complex queries within my application which involves that token streams are mapped to queries using a custom querybuilder sub-classed from "org.apache.lucene.util.Querybuilder".

This produces lucene queries which, as far as i understand, cannot be directly used within the ES-API. I have written a small "query bridge" which maps lucene queries to the corresponding elastic-querybuilders, but is that really necessary? isn't there already a solution?

Ok, I figured out that it might be better to move the query building process completely into ES.

Actually I don't now whether I really need a custom querybuilder or not, so here is my scenario:

I have a custom analyzer for the german language. Splitting up compound words results in the sub-terms having the same position in the tokenstream as the compound word. When using a "standard" querybuilder, this will result in a bool-query that puts every sub-term into a should clause.

This might be problematic as the lucene score will be multiplied with a scoring value that represents "the business logic" for the current document. So what might happen is that a document, that matches maybe only one sub-term of the compound word will be placed at the top of the result set because it has a very high "business logic" scoring.

Can this problem be solved with intelligent sub-queries or do I need to place my own querybuilder into ES?

I found out that the problem is that I'm using decompounding during search time, which is not a great idea.

My analyzer is introducing informations about which terms are good candidates for fuzzy matching an which not. So I want to use this information in the querybuilder to decide when to use a simple Term- or FuzzyQuery. Is it possible to override the standard querybuilder which is used in MultiMatchQuery? If so, any example how this is done?

No it's not possible. It's a private class that applies the logic defined in the match query options and I don't think we should allow customizations at this level. You'll need to define your own query parser (that uses your custom query builder) in a plugin.
You can also look at the custom field mapper option:
https://www.elastic.co/guide/en/elasticsearch/plugins/6.2/mapper.html
If you define your own field type you can override MappedFieldType#termQuery which is called by the query parsers to build a term query on the field. You cannot access the attributes from the token stream in this method so it depends on what you need to determine if a term is a good candidate for fuzzy matching or not (if you need more than the term itself).

The problem with using MappedFieldType is that I need more information than just the term.

What I'm trying: for every token in a search query I'm doing a dictionary lookup, if its present => don't use fuzzing. As the analyzer is for the german language I'm also trying to decompose the token. If I can fully decompose it => don't use fuzzing. If it is partially decomposed, I will check the sub-terms whether they build the prefix of the token => use the prefix as prefixLength for the fuzzy query.
Additional I don't want numbers to be fuzzed.

So I think the only solution is to build my own query. As I only need to change a few lines compared to MatchQuery: I think it is not possible to just extend MatchQuery and its parser by overriding a few methods?

You can have a link to the dictionary in the MappedFieldType and performs the decomposition in MappedFieldType#termQuery and MappedFieldType#fuzzyQuery if you want but I agree that your use case is better suited for a TokenFilter or a query parser.
Regarding the extensibility of the MatchQuery we could provide some extension point but you'd still need to write a plugin so this wouldn't save much. The logic in this query is quite simple so you can use it as a starting point for your query parser and add the option that you need freely.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.