Hi,
I'm using Elasticsearch and I’ve noticed something strange. For some queries, I get no documents back, even though I have set top_k = 4
. What’s confusing is that sometimes the same query does return results, but other times it doesn't.
Out of 100 queries, a few of them return nothing at all.
Has anyone seen this happen before? Any idea what might be causing it?
Thanks for your help!
Hi @Sushmita_Gupta,
Welcome! Which version of Elasticsearch are you using? Can you share the query you are running and the documents you expect to match top_k=4
?
Let us know!
Hi @carly.richmond,
I am currently using Elasticsearch version 8.8.1 on my local machine (not on the cloud) in a development server. I ran 100 queries, and out of them, 14 queries did not retrieve any documents. I repeated the experiment with the same 100 queries, and this time, only 4 queries did not retrieve any documents.
timeout? it will return zero when query is not finish in time and gets kill? is it big dataset that you are query?
No, it's not a very large dataset — it's only 10 GB of data in Elasticsearch.
Can you share an example query and documents that you expect to be returned for us to investigate? It's very difficult to say what's happening without more information.
You could also try investigating the query using the Search Profiler in Kibana or the profile API.
Hope that helps!
I used the query profiler and in the case of no documents a MatchNoDocQuery was invoked, which states that no documents were found, the current query is a pre-filter query which is used to filter the docs first and then run a knn search on those filtered documents.
Please find the query below
elastic_query = {
"size": 4,
"knn": {
"field": "embedding",
"k": 4,
"query_vector": embedded_query,
"num_candidates": 1000,
"filter": {
"term": {
"meta._bot": _bot
}
},
},
"_source": ["text", "meta"],
"profile": True
}
When I tried increasing num_candidates the query started working, and also when I reindexed the docs, still the query returned me proper docs. Now I am confused like what could be the reason for such behaviour, is the HSNW graph traversal unoptimal or what?
Also when I read the profiler, in case of faulty index, the num_candidates were 100 but the profiler told that for response generation the actual number of docs used for query were 33 only, but when I kept the num_candidates equal to the total no of docs in the index, it worked?, but the same query with 100 num_candidates worked on a new index with reindexed docs.