This is our beta cluster which has always been a bit weird.
And have some output format:
manybubbles@deployment-bastion:/srv/mediawiki$ curl -s deployment-elastic05:9200/_flush/synced | python -mjson.tool | tee synced_flush
{
"_shards": {
"failed": 25,
"successful": 740,
"total": 765
},
"commonswiki_content_first": {
"failed": 0,
"successful": 3,
"total": 3
},
"commonswiki_file_first": {
"failed": 9,
"failures": [
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "x0CoVTqaTTqj1IuoLIB0LQ",
"primary": false,
"relocating_node": null,
"shard": 3,
"state": "STARTED"
},
"shard": 3
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "RWFa4RqbQjG7xmP95l_b0Q",
"primary": true,
"relocating_node": null,
"shard": 3,
"state": "STARTED"
},
"shard": 3
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "-qPd7LtcQRaszihtRgytWA",
"primary": false,
"relocating_node": null,
"shard": 1,
"state": "STARTED"
},
"shard": 1
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "x0CoVTqaTTqj1IuoLIB0LQ",
"primary": false,
"relocating_node": null,
"shard": 1,
"state": "STARTED"
},
"shard": 1
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "Ro1ZcPd9SVetETtEgl8Ceg",
"primary": false,
"relocating_node": null,
"shard": 1,
"state": "STARTED"
},
"shard": 1
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "RWFa4RqbQjG7xmP95l_b0Q",
"primary": true,
"relocating_node": null,
"shard": 1,
"state": "STARTED"
},
"shard": 1
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "-qPd7LtcQRaszihtRgytWA",
"primary": false,
"relocating_node": null,
"shard": 6,
"state": "STARTED"
},
"shard": 6
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "RWFa4RqbQjG7xmP95l_b0Q",
"primary": true,
"relocating_node": null,
"shard": 6,
"state": "STARTED"
},
"shard": 6
},
{
"reason": "pending operations",
"routing": {
"index": "commonswiki_file_first",
"node": "Ro1ZcPd9SVetETtEgl8Ceg",
"primary": false,
"relocating_node": null,
"shard": 6,
"state": "STARTED"
},
"shard": 6
}
],
"successful": 19,
"total": 28
},
...
We've totally isolated the cluster:
manybubbles@deployment-bastion:/srv/mediawiki$ curl 'deployment-elastic05:9200/_cat/thread_pool?v&h=host,optimize.active,merge.active,warmer.active,suggest.active,snapshot.active,search.active,refresh.active,percolate.active,management.active,index.active,get.active,generic.active,flush.active,bulk.active'
host optimize.active merge.active warmer.active suggest.active snapshot.active search.active refresh.active percolate.active management.active index.active get.active generic.active flush.active bulk.active
deployment-elastic05 0 0 0 0 0 0 0 0 1 0 0 0 0 0
deployment-elastic08 0 0 0 0 0 0 0 0 1 0 0 0 0 0
deployment-elastic06 0 0 0 0 0 0 0 0 1 0 0 0 0 0
deployment-elastic07 0 0 0 0 0 0 0 0 1 0 0 0 0 0
But maybe I have a hint:
watch curl -s \''deployment-elastic05:9200/_cat/thread_pool?v&h=host,management.completed'\'
Every 2.0s: curl -s 'deployment-elastic05:9200/_cat/thread_pool?v&h=host,management.completed' Wed Jul 8 15:49:46 2015
host management.completed
deployment-elastic05 2519025 <--- These number keep going up
deployment-elastic08 2479017
deployment-elastic06 2506622
deployment-elastic07 2517234