Hi,
We are experiencing a problem on one of our Elasticsearch clusters (log data) which essentially results in each node having different document counts, and some documents actually missing (i.e. non-existent in the affected index, even after successfully bulk-indexing them).
A bit of history: three days ago one the nodes was restarted, a day later, the good people of Amazon decided it was time to wreck some chaos again and removed the same node in a very rough way. After coming back up, none of the nodes managed to sync up their document counts again.
This is the first time ever this happened, normally we see a full recovery (without any problems) whenever the wrath of Bezos has struck.
Elasticsearch version: 1.7.2.
Cluster status is green.
It looks like some of the shards might have a problem (wild guess on my side), however, the green cluster status obviously reports the opposite.
After restarting node 3 (to see if a re-sync would equalize the doc counts) the day before yesterday, no indexing or other issues (like missing docs) were experienced. (however: doc counts are still off).
Does anyone have any idea on how to analyse this / find the problem cause?
Here's an excerpt of a call to [host]:9200/_nodes/stats
- perhaps this might help. As you can see, doc counts are off by approximately 1000 for each node.
Cheers,
- Chris
{
"cluster_name": "elasticsearch-dp-logs",
"nodes": {
"5ckIUqVDTUyM1guSLcPLgQ": {
...
"attributes": {
"max_local_storage_nodes": "1",
"aws_availability_zone": "eu-west-1a"
},
"indices": {
"docs": {
"count": 83416560,
"deleted": 1365900
},
"store": {
"size_in_bytes": 32864527757,
"throttle_time_in_millis": 634799
},
"indexing": {
"index_total": 453133,
"index_time_in_millis": 69542,
"index_current": 0,
"delete_total": 72185,
"delete_time_in_millis": 1347,
"delete_current": 0,
"noop_update_total": 0,
"is_throttled": false,
"throttle_time_in_millis": 0
},
....
},
"C3OGC7SeQmqFi7GdznmmJQ": {
...
"attributes": {
"max_local_storage_nodes": "1",
"aws_availability_zone": "eu-west-1b"
},
"indices": {
"docs": {
"count": 83417604,
"deleted": 1365900
},
"store": {
"size_in_bytes": 32865600740,
"throttle_time_in_millis": 31078059
},
"indexing": {
"index_total": 33108517,
"index_time_in_millis": 5192754,
"index_current": 15,
"delete_total": 6669222,
"delete_time_in_millis": 153449,
"delete_current": 0,
"noop_update_total": 0,
"is_throttled": false,
"throttle_time_in_millis": 0
},
....
},
"pgS_AafQTzOJjDoSUYGQXQ": {
...
"attributes": {
"max_local_storage_nodes": "1",
"aws_availability_zone": "eu-west-1c"
},
"indices": {
"docs": {
"count": 83415579,
"deleted": 1365900
},
"store": {
"size_in_bytes": 32863266796,
"throttle_time_in_millis": 556604
},
"indexing": {
"index_total": 370684,
"index_time_in_millis": 51746,
"index_current": 0,
"delete_total": 58590,
"delete_time_in_millis": 971,
"delete_current": 0,
"noop_update_total": 0,
"is_throttled": false,
"throttle_time_in_millis": 0
},
...
}
}