Match only?

ntak · February 10, 2022, 7:09pm

Hello!

I'm new to ES and I'm trying to figure out how to make a query which asserts the following:

All given tokens are present in the field - regardless of the order
No other token are present.

Ideally, I'd also be able to confirm matching frequency of token in the field.

e.g. my index contains the following documents

{"message": "hello world"}
{"message": "hello hello world"}
{"message": "hello hello world byebye"}

And I'd need to be able to be able to generate a query from "hello world hello" which would match the second document only. This solution needs to be scalable.

Solution 1 :
Use regex queries. This is rather slow so I'd avoid if possible.

Solution 2:
Pass the field through the analyzer to get the tokenized/clean version, produce a count of each unique term, sort the count by alphanumerical order and add a field which contains this sorted count. Finally perform a query which asserts :

That every token are present.
The sorted-unique-token-count matches as a keyword.

This new field asserts that there are the same amount of unique token and that their count is the same.

e.g.

{"message": "hello world", "ucount":"1/1"}
{"message": "hello hello world, "ucount":"2/1""}
{"message": "hello hello world byebye", "ucount":"1/2/1"}

The problem with this solution is that I can't bulk call the analyzer to perform the tokenization/cleaning of the fields - calling it for every entry is not realistic in my case. I could workaround by reimplementing the analyzer in my code but then it becomes a potential cause for bug if there is ever a change of that analyzer in a subsequent version of ES.

I'm hopeful that there is a simpler way - or at least quicker - to do this - some kind of "match only" - or maybe a way to get a similar behaviour by using a combination of "must" and "must_not".

I'm using ES 7 and the python client interface.

Many tanks

Tomo_M · February 12, 2022, 3:45pm

You may use match phrase query. To get token frequency, you may use Term vectors API.

ntak · February 15, 2022, 6:40pm

Thanks for the quick answer. I've considered using match_phrase but it seems it is not agnostic to the term orders ("hello world hello" does not match with "hello hello world").

I've also looked at the term vectors but it seems to be only available at the document level and not at the field level.

system · March 15, 2022, 6:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AND Match All Tokens in Any Order Elasticsearch	1	759	February 28, 2019
Only match if all tokens of an indexed field are included in the search query in any order Elasticsearch	2	869	June 22, 2022
How do I build a query such that each token in a document field is matched? Elasticsearch	12	1961	July 6, 2017
Exact Match Query on analyzed field Elasticsearch	2	2333	July 6, 2017
Search multiple fields with “and” operator (but use fields' own analyzers) Elasticsearch	7	2425	July 6, 2017

Match only?

Related topics