I’m trying to achieve Proximity searches with whole wild card,
• (Alice * Bob) NEAR/4 (Sam * * John): find all the documents having the phrase “Alice * Bob” within 4 words distance to “Sam * * John”.
I’ve tried following in the Elasticsearch:
Problem seen:
Nested intervals query did not work as the max_gaps parameter was focusing on max distance and not exact distance.
Scripting is slow and this will require per document field data and text fields are not optimized for such operations. So we had to either use via keyword (we want full text) or enable fielddata (this will use more memory).
@Abhishek_Shinde sorry, that is incorrect as far as needing field data and/or using a keyword field assuming you are only checking the interval gaps as I suggested. Please see the documentation. Yes, scripting is slower but in this specific scenario the proximity operations are most likely going to be what hurts performance vs. the small overhead of the script.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.