Searching tokenized field containing "." returns unexpected results

I have a newly tokenized field of type text for domains. It is a reverse path hierarchy tokenizer on the "."
So a field like "adservice.google.com" becomes the following 3 tokens:
com
google.com
adservice.google.com

I assumed I could then search the field, from say kibana, with the syntax:
+Domain:google.com
But it finds all fields with a "com" token, so I'll get microsoft.com and amazon.com, etc.
I tried encasing in quotes and escaping the ".", but neither helped.

However:
A single wildcard, or regex both work as expected:
+Domain:google?com
+Domain:/google.com/

Can someone point me to an explanation of why the original syntax returns unexpected results?
+Domain:google.com

Edit: elastic stack 6.3.2

It happens because you apply the same tokenization on the search side as well. So basically when you search for +Domain:google.com it is translated into +Synonym(domain:com domain:google.com). To avoid this you need to use a different analyzer for searching that uses keyword tokenizer instead of reverse path hierarchy tokenizer.

1 Like

Wow, I didn't know that search_analyzer was even a thing. Thank you for that pointer!
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.