Cluster eventually starts giving absurdly wrong counts on search

caarlos0 · October 14, 2020, 7:10pm

We have a cluster running version 5.6.16. It has ~5.7k primary shards, ~2k indices and 28 nodes (3 masters, 3 coordinators and 22 data nodes):

{
  "cluster_name": "foo",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 28,
  "number_of_data_nodes": 22,
  "active_primary_shards": 5778,
  "active_shards": 11556,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

Eventually, whatever search we do in some indices return an insanely high doc count, even when no results are found.

For instance:

curl -s 'http://localhost:9200/*/_search?q=nope:thiswillneverexist&terminate_after=1' | jq -r '.'
{
  "took": 871,
  "timed_out": false,
  "terminated_early": false,
  "num_reduce_phases": 12,
  "_shards": {
    "total": 5778,
    "successful": 5778,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9787770,
    "max_score": 2,
    "hits": []
  }
}

I couldn't find anything in the logs or any correlation with anything else, nor any other issues/foruns/etc (maybe I don't know what to search for exactly).

The only workaround we found was to restart all the data nodes

Anyway, has everyone seen anything like this? Anything I could investigate?

Thanks!

caarlos0 · October 25, 2020, 1:20pm

FWIW, this happens even when no results should be found, and the count, once the issue happens, is always the same...

dadoonet · October 25, 2020, 3:49pm

That looks weird.

Unfortunately this version is not maintained anymore so even if it's a bug, it won't be fixed.

I wonder if after upgrading to 6.8 or better 7.9 you can still see this problem.

caarlos0 · October 28, 2020, 2:06am

Unfortunately this version is not maintained anymore so even if it's a bug, it won't be fixed.

yes, that was my fear...

I wonder if after upgrading to 6.8 or better 7.9 you can still see this problem.

Not that easy to do in our case unfortunately...

Still hope I can find some workaround a bit better than restarting the whole cluster though

system · November 25, 2020, 2:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
0 hits, 14693688 total on index with 700 documents Elasticsearch	15	2257	November 7, 2017
Different document counts on each cluster node Elasticsearch	1	628	July 5, 2017
Inaccurate sum aggregation results Elasticsearch	2	454	May 25, 2018
Elasticsearch cluster nodes not returning correct data count while shard initialisation Elasticsearch	3	535	July 5, 2017
ElasticSearch cluster nodes do not match cluster health Elasticsearch	1	267	July 6, 2017

Cluster eventually starts giving absurdly wrong counts on search

Related topics