ES returns different results (and totals) alternately

Hi, could use some help here.

Something encountered on our ES 5.4 production cluster (4 nodes) with one of our indices (~500gb) a few days ago - The same exact request to the same server was returning different results and different total hits alternately - First call x total, then y, then x again, then y again, and so forth... Same issue with different requests on the same index, though the other indices in the cluster seemed unaffected.

We've restored the index on a separate node and the problem persisted. However, today the issue disappeared practically by itself (we've reindexed a lot of the data but not all). Still very worrying of course.

Any idea what caused the problem, and how we could avoid it in the future?

The schema is quite big so I won't add it here, but let me know if more details are needed.

I had this once with kind of split brain issue. Might not be the case here though.

May be try first to find which shard is different than the other ones. preference option can be useful.
_cat/shards will also help I think.

If it’s a replica only which is wrong, change the number of replica to 0 or switch off the wrong node, remove its data dir and restart it.

Note that it will copy lot of data over the wire probably.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.