We have a cluster running version 5.6.16. It has ~5.7k primary shards, ~2k indices and 28 nodes (3 masters, 3 coordinators and 22 data nodes):
{
"cluster_name": "foo",
"status": "green",
"timed_out": false,
"number_of_nodes": 28,
"number_of_data_nodes": 22,
"active_primary_shards": 5778,
"active_shards": 11556,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}
Eventually, whatever search we do in some indices return an insanely high doc count, even when no results are found.
For instance:
curl -s 'http://localhost:9200/*/_search?q=nope:thiswillneverexist&terminate_after=1' | jq -r '.'
{
"took": 871,
"timed_out": false,
"terminated_early": false,
"num_reduce_phases": 12,
"_shards": {
"total": 5778,
"successful": 5778,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9787770,
"max_score": 2,
"hits": []
}
}
I couldn't find anything in the logs or any correlation with anything else, nor any other issues/foruns/etc (maybe I don't know what to search for exactly).
The only workaround we found was to restart all the data nodes
Anyway, has everyone seen anything like this? Anything I could investigate?
Thanks!