First, some background. I understand that the algorithmic stemmer is not perfect, e.g. "focused" is stemmed to "focus," while "focus" is stemmed to "focu," which I've validated by looking through the term vectors.
However, when I execute queries like:
GET /_search
{
"query": {
"query_string": {
"query": "focus*",
"analyze_wildcard": True,
"allow_leading_wildcard": True
},
}
}
across documents that contain explicitly mapped "text" fields with instances of "focus" and "focused," I retrieve results of both instances.
I assumed that since the wildcard contents wouldn't be analyzed, I would only get results for "focused," since "focu" wouldn't match in the inverted index.
The only conclusion I can draw is that the query is searching the entire source document, rather than solely the inverted index, however I haven't come across any documentation confirming this.
This leaves me with the following questions:
- How do query string queries containing wildcards search for matches?
- If the query is doing a full sweep of the source documents, is there a way to disable this behavior and only utilize the inverted index?
Please do not comment on the inefficiency of using wildcard queries, I'm here purely to understand how the operation is being completed.