Match query unexpected behavior with token filter

redshadow · November 16, 2016, 6:54pm

I have an analyzer that analyzes to create redundant tokens. For instance:

Original string: "TokyoJapan Samurai"
After tokenizing-->

"TokyoJapan"
"Samurai"
After token filter-->
"tokyo", "japan", "tokyojapan"
"samurai"
therefore, we get a total of 4 tokens ie ("tokyo", "japan", "tokyojapan", "samurai")

Now if i issue an ANDed match query using the same analyzer to match the string "OsakaJapan Samurai":
after token filter (just like above), the tokens generated should be-->

"osaka", "japan", "osakajapan"
"samurai"

However, this matches a field containing "TokyoJapan Samurai" as well. The reason is that even with the AND for match queries, it is internally looking for:
MATCH any of tokens generated in (1) ie ("osaka" OR "japan" OR "osakajapan")
AND any of tokens generated in (2) ie ("samurai")

ie ("osaka" OR "japan" OR "osakajapan") AND ("samurai")

I would have ideally liked it to be:
MATCH (("osaka" and "japan") OR "osakajapan") AND "samurai"

This way the following docs would have matched/notmatched -->

"OsakaJapan Samurai" -matches obviously
"osakAjapan Samurai" - matches only because osakajapan matches, and not because "osak" "Ajapan" exists
"Osaka Samurai" - matches because 'osaka' and 'samurai' match.
"TokyoJapan samurai" -- doesnt match because neither (tokyo and japan) nor (tokyojapan) match.

Any idea how I can achieve this behavior?

I believe I'd have to use a different search analyzer from the indexing analyzer, but I may be wrong.

system · December 14, 2016, 6:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Matching every documents tokens Elasticsearch	1	597	July 5, 2017
How do I build a query such that each token in a document field is matched? Elasticsearch	12	2020	July 6, 2017
Match query on field with custom analyzer not working properly with operator or minimum_should_match Elasticsearch	16	1987	March 27, 2019
How to match all tokens? (get rid of false positive result) Elasticsearch	1	225	December 16, 2021
Set operator for analyzed tokens Elasticsearch	2	384	August 27, 2018

Match query unexpected behavior with token filter

Related topics