and we want to modify it to obtain only exact matches of "but not both". We want the match to be case insensitive, and ensure the words appear exactly in that order - i.e. "yes, but not both instances" should be a match while "but, you see, both images do not" should not. How to modify it?
Hey Carlos, thanks for your answer! I tried doing a simple test with match_phrase as you suggested but it's not working as expected. I get a lot of results but not a single matches the "but not both", which is what I'm trying to do
Please show exactly what you tried as well as what was and was not returned as expected. It would be great if you also could include the mappings of the fields involved in the query.
That's a red flag to me. It's surely not that hard to remove any sensitive information from a small sample of your documents, or create a few sample dummy documents.
Please create a small dummy document with some text in the body field that can be used to demonstrate the issue. When you blank out text in a string the way you have done it is impossible for us to analyse or recreate the issue. For all we know the matching component may exist in the parts you have commented out...
When we have access to a dummy document and associated query that replicates the problem we can see how it is analysed and better help. Without this it is a lot of time consuming guesswork and a waste of time.
Ok, I'm trying to create a minimal example as suggested, but even if I create the exact same document in a new index, the query doesn't get the document Are there any other components here that could be affecting the result besides the mapping?
EDIT: I just found the problem lied with the analyzer. It was removing "but" and "not" since they are stopwords. If I use "body.simple"in the query, it works as expected. Thank you so much for your time!
Mmm, to my understanding, the analyzer was working completely consistently with its own documentation. So personally, I think this mis-states where the problem was ...
But I am certainly glad you now understand things better.
I tested the example query against both fields in my example and the document was not found with either analyser. It therefore seems you were sloppy and posted an incorrect mapping when asked for it and therefore wasted everyones time. If we had seen that an analyser which removes stopwords was being used the example and the result would have been clear immediately.
This is why you should ALWAYS condense the problem down to a minimal, full and reproducible example. Often that also helps you find the issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.