I have synonyms in synonyms.txt - "auto, vehicle => car".
In index I have a document with string "car" and an analyzer to handle synonyms.
When you use "auto", for example, it will also return you results for "car". But when I have a typo in the synonym something like "vhicle" or "apto" it doesn't recognize the synonym and as a result original document value "car".
I tried to apply fuzziness, but it only applies to the original value that I have in the index and handles typos in "car" but not in synonyms. So either exact match for the synonym or a fuzzy query for the original string "car" works.
UPD: with this solution there is also another issue. If in the index I have "cars" and use "auto" in query it can't find "cars". How can it be solved? So I want to find something with the synonym even if there is a typo/another form of the word in the synonym or in the original document.
Great, thank you so much. The only thing is I am struggling with is typos. If you have let's say in index "vhicles" and in query "cars", it will not find "vhicles".
Yes, typo should be handled on both sides. So if I have "vhicles" in index and a typo "aptos" in query, it should first find the synony, "auto", then convert it to "vehicle" and then use something like fuzzy search to find "vhicles". At least that is how I built it in my head Is it even possible?
A user can create a record and this record goes to the index. That is why it can contain a typo. So "vhicle" in index should be found with a query "car", the same way as "car" in index with query "vhicle"
this sounds very strange to me. I would not index spelling errors, what you want from my point of view is not correct (you index the wrong term and want to use fuzzy to match the right term with the wrong term that is indexed).
I would review this requirement.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.