Different document counts on each cluster node

Hi,

We are experiencing a problem on one of our Elasticsearch clusters (log data) which essentially results in each node having different document counts, and some documents actually missing (i.e. non-existent in the affected index, even after successfully bulk-indexing them).

A bit of history: three days ago one the nodes was restarted, a day later, the good people of Amazon decided it was time to wreck some chaos again and removed the same node in a very rough way. After coming back up, none of the nodes managed to sync up their document counts again.

This is the first time ever this happened, normally we see a full recovery (without any problems) whenever the wrath of Bezos has struck.

Elasticsearch version: 1.7.2.
Cluster status is green.

It looks like some of the shards might have a problem (wild guess on my side), however, the green cluster status obviously reports the opposite.

After restarting node 3 (to see if a re-sync would equalize the doc counts) the day before yesterday, no indexing or other issues (like missing docs) were experienced. (however: doc counts are still off).

Does anyone have any idea on how to analyse this / find the problem cause?

Here's an excerpt of a call to [host]:9200/_nodes/stats - perhaps this might help. As you can see, doc counts are off by approximately 1000 for each node.

Cheers,

  • Chris
{
  "cluster_name": "elasticsearch-dp-logs",
  "nodes": {
    "5ckIUqVDTUyM1guSLcPLgQ": {
      ...
      "attributes": {
        "max_local_storage_nodes": "1",
        "aws_availability_zone": "eu-west-1a"
      },
      "indices": {
        "docs": {
          "count": 83416560,
          "deleted": 1365900
        },
        "store": {
          "size_in_bytes": 32864527757,
          "throttle_time_in_millis": 634799
        },
        "indexing": {
          "index_total": 453133,
          "index_time_in_millis": 69542,
          "index_current": 0,
          "delete_total": 72185,
          "delete_time_in_millis": 1347,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
        ....
    },
    "C3OGC7SeQmqFi7GdznmmJQ": {
      ...
      "attributes": {
        "max_local_storage_nodes": "1",
        "aws_availability_zone": "eu-west-1b"
      },
      "indices": {
        "docs": {
          "count": 83417604,
          "deleted": 1365900
        },
        "store": {
          "size_in_bytes": 32865600740,
          "throttle_time_in_millis": 31078059
        },
        "indexing": {
          "index_total": 33108517,
          "index_time_in_millis": 5192754,
          "index_current": 15,
          "delete_total": 6669222,
          "delete_time_in_millis": 153449,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
        ....
    },
    "pgS_AafQTzOJjDoSUYGQXQ": {
      ...
      "attributes": {
        "max_local_storage_nodes": "1",
        "aws_availability_zone": "eu-west-1c"
      },
      "indices": {
        "docs": {
          "count": 83415579,
          "deleted": 1365900
        },
        "store": {
          "size_in_bytes": 32863266796,
          "throttle_time_in_millis": 556604
        },
        "indexing": {
          "index_total": 370684,
          "index_time_in_millis": 51746,
          "index_current": 0,
          "delete_total": 58590,
          "delete_time_in_millis": 971,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
        ...
  }
}