A standard query for "pizza" would return all of those documents scored, but not return the single document that's needed, doc3 in this case. If I search for "cheese pizza", I need doc1 to be returned.
I found this link to be helpful in that it discusses the idea of storing a second field (perhaps "nameCount") with the number of terms in the searched field.
Is there a way to have ES compute the number of terms (after stop words) during document insertion, and then also compute the number of terms of the query to account for same stop words, assume using a script?
Can someone recommend a better approach or share links to solutions? Thanks!
Elasticsearch supports text analysis over string fields. There are a couple of supported types for string fields, that are text & keyword. If your use case requires you to search on the basis of exact matches, I would recommend you to change it to keyword for your index templates.
@mjunaidmuzammil Thanks. The problem with this is that I cannot rely on exact character matches, spelled exactly. Perhaps I should have clarified, we need exact analyzed matches of the inverted index from a full-text field. Example, if someone queries for "pizzas", "Pissa" or "piza", they should also match doc3 only, allowing for fuzzy matching. Is this impossible?
I'm so sorry, I'm was still not being clear as I could have been. My wording of "exact" is imprecise. I'm looking for perfect matches, meaning that all analyzed terms I query for must be the only terms in the full-text field. Perhaps another qualification would be equal matches. And I do not want to store name fields as keyword types, only text.
Analyzed terms for the example documents above would yield these inverted index entries:
If the query is "pizza", then the only match can be doc3: ["pizza"].
If the query is "cheese pizza", the only match can be doc1: ["cheese", "pizza"].
I hope this makes sense.
I then want to boost these perfect matches to 1000, say, to indicate these are identical term set matches. I've scoured the ES documentation, and don't see any way to direct ES to consider this type of matching.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.