Query randomly returns empty and right results

Elasticsearch 5.6.5. 2 data nodes + 1 index node and 2 shards for each index.

For the same query
http://localhost:9200/indexname-2019.03/_search?q=_id:1990868144

Every time I run it I get different result:
First time I get:
{"took":2,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
Second time:
{"took":2,"timed_out":false,"_shards":{"total":2,"successful":2,"skipped":0,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"indexname-2019.03","_type":"myType","_id":"1990868144","_score":1.0,"_source":{...}}]}}

Also, if I add time range to the query or search this item via Kibana (which always adds time range), I never get this document back.
This is second time I'm experience a such issue. After the first time I just updated the document and it became normal.

Questions are:

  1. What is happening here?
  2. Is there any way to know now much other invalid documents I have?

Are you concurrently indexing and/or deleting this document? If so then we'd expect to get different results over time.

If not, I suspect that your searches are hitting different shard copies which are exposing different documents to the search. You can use the search preference (e.g. _only_nodes) to make Elasticsearch use a particular shard copy each time, and you should see consistent results from each shard copy.

Have you refreshed this index? The different copies of a shard do not refresh in a coordinated fashion, so in general they each expose different documents to searches.

If this persists after a refresh then I suspect the shard copies have fallen out of sync. This is possible in 5.x in certain circumstances. This has been addressed in 6.x with some major changes to the replication model.

Thanks for the fast reply.

Some facts/answers:

  • The item from my question was indexed at 2019-03-25. From this time it nothing touch/reindex/update/delete it. This problem was found only today, 6 days after

  • From elasticsearch response I see it queries both shards and don't tell there were problems.

  • The index refreshes every 30 sec, so it's not the issue

  • Our DBAs already re-indexed this document and the issue has gone. I can't now try suggested "search preferences", but will try it next time.

  • We are planing to upgrade our cluster to 6.x. Now, when you say there was related issue fixed, we will prioritize it.

The only question I have for now is there a way to find other problematic documents/shards? This one we found indecently by checking invalid data for some client

Again, thanks for reply

Sure, you have two shards, each with multiple copies (either primary or replica). Each search only searches one copy of each shard.

Not very easily. One way is to find all the copies of all the shards, then enumerate the IDs in each shard copy and look for differences by doing two scroll searches using the _shards and _only_nodes preferences to control which shard copy you're searching.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.