Genre Expansion in Elasticsearch 6.1


(Alexander Zheludkov) #1

Hello All!

I need some hierarchical structure of synonyms to search, as it described here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html

When I search for "pet" I want to find documents with "cat" and "dog", and so on. Is it possible in Elastic 6.1? Could you give me an example?


(David Pilato) #2

Yes. Use a Synonym TokenFilter in your analyzer. https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analysis-synonym-tokenfilter.html


(Alexander Zheludkov) #3

I tried Synonym TokenFilter, but I can't make it hierarchical. For example, if I write something like this in Solr:

pet => cat, dog, bird
cat => kitten, kitty
dog => chow chow, malamute
bird => parrot, hawk

and try to search "pet" I'll find documents with "cat" and "dog" inside, but not with "chow chow". Of course I can write all the downgrade synonyms in one string, but if I try to add about 70_000 low level synonyms, Elasticsearch doesn't create index (I waited for hour, nothing has changed).

I read about Synonym Graph Token Filter that could be useful, but I can't find proper realization(


(David Pilato) #4

I read about Synonym Graph Token Filter that could be useful, but I can't find proper realization(

Do you mean this? https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analysis-synonym-graph-tokenfilter.html

But anyway, I "think" that it should behave the same way as Solr does. But I'm not quite an expert on that part. May be @jimczi could tell more?


(Jimferenczi) #5

The example on the guide is a bit misleading. It's not a hierarchical structure, only one rule gets applied.
The idea of this example is to show how to do rewrite a single term into multiple terms. cat is rewritten to cat and pet and kitten is expanded to cat, pet, kitten. As you can see each rule in the example contains all expansions.
Now it becomes tricky when you mix indexation and querying. If you set a synonym filter with your example rules at indexing and query time then a document that contains pet would index cat, dog, bird and when querying for pet you would in fact search for cat, dog, bird. Though documents that contain dog are already rewritten into chow chow, malamute so dog would never match. The synonym_graph is not useful for this, you should revise your rules based on how the synonyms are applied.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.