I tried Synonym TokenFilter, but I can't make it hierarchical. For example, if I write something like this in Solr:
pet => cat, dog, bird
cat => kitten, kitty
dog => chow chow, malamute
bird => parrot, hawk
and try to search "pet" I'll find documents with "cat" and "dog" inside, but not with "chow chow". Of course I can write all the downgrade synonyms in one string, but if I try to add about 70_000 low level synonyms, Elasticsearch doesn't create index (I waited for hour, nothing has changed).
I read about Synonym Graph Token Filter that could be useful, but I can't find proper realization(
The example on the guide is a bit misleading. It's not a hierarchical structure, only one rule gets applied.
The idea of this example is to show how to do rewrite a single term into multiple terms. cat is rewritten to cat and pet and kitten is expanded to cat, pet, kitten. As you can see each rule in the example contains all expansions.
Now it becomes tricky when you mix indexation and querying. If you set a synonym filter with your example rules at indexing and query time then a document that contains pet would index cat, dog, bird and when querying for pet you would in fact search for cat, dog, bird. Though documents that contain dog are already rewritten into chow chow, malamute so dog would never match. The synonym_graph is not useful for this, you should revise your rules based on how the synonyms are applied.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.