Synced Flush Causes Node to Restart


(Harlin) #1

I was attempting to do a cluster wide restart on my Elasticsearch cluster by first doing a synced flush, but ran into problems. The sync caused one of our nodes to randomly restart, sending our cluster into a time-consuming recovery phase.

Before the sync call I set: "cluster.routing.allocation.enable": "none", in preparation for the restart.

I am running an 8 node cluster. All nodes are master + data nodes. We have a total of 4TB of data across the cluster.

Has anybody else ran into issues with synced flushes?

Thanks in advanced for any help,
Harlin


(Mark Walkom) #2

What version, did you check your logs?

This isn't something I've seen.


(Harlin) #3

The logs only show the shutdown, not much else:

[2015-06-23 20:36:21,134][INFO ][discovery.ec2 ] [Leech] master_left [[Wizard][C4feeCcxQQKuCx8eItWuAQ][ip-10-70-104-157][inet[/10.70.104.157:9300]]{master=true}], reason [shut_down]
[2015-06-23 20:36:21,136][WARN ][discovery.ec2 ] [Leech] master left (reason = shut_down), current nodes: {[Shriker][qjQLhrV4QGW69gje_bbEgg][ip-10-70-113-90][inet[/10.70.113.90:9300]]{master=true},[Blur][AXei-iNGQYOq9zqIMGW1XQ][ip-10-70-113-89][inet[/10.70.113.89:9300]]{master=true},[Thunderstrike][8jnth1NDRECV_VRxhQKcLw][ip-10-70-69-119][inet[/10.70.69.119:9300]]{master=true},[Jack Kirby][6hYvIZ9PQ4ah9bgPeYjNOA][ip-10-70-113-91][inet[/10.70.113.91:9300]]{master=true},[Armand Martel][6O2SniZ-TGO7_p193-hiwA][ip-10-70-113-94][inet[/10.70.113.94:9300]]{master=true},[Manikin][a2j-vPTRTPG0GjM5VA4N5w][ip-10-70-113-93][inet[/10.70.113.93:9300]]{master=true},[Leech][44XSwmYqS2e2kBEFc8O7RQ][ip-10-70-113-92][inet[/10.70.113.92:9300]]{master=true},}
[2015-06-23 20:36:21,137][INFO ][cluster.service ] [Leech] removed {[Wizard][C4feeCcxQQKuCx8eItWuAQ][ip-10-70-104-157][inet[/10.70.104.157:9300]]{master=true},}, reason: zen-disco-master_failed ([Wizard][C4feeCcxQQKuCx8eItWuAQ][ip-10-70-104-157][inet[/10.70.104.157:9300]]{master=true})
[2015-06-23 20:36:28,715][INFO ][cluster.service ] [Leech] new_master [Leech][44XSwmYqS2e2kBEFc8O7RQ][ip-10-70-113-92][inet[/10.70.113.92:9300]]{master=true}, reason: zen-disco-join (elected_as_master)
[2015-06-23 20:36:40,322][INFO ][cluster.service ] [Leech] added {[Anvil][uX49t4y7RlWYOanQb3t2yA][ip-10-70-104-157][inet[/10.70.104.157:9300]]{master=true},}, reason: zen-disco-receive(join from node[[Anvil][uX49t4y7RlWYOanQb3t2yA][ip-10-70-104-157][inet[/10.70.104.157:9300]]{master=true}])

It was the master node that failed, maybe the flush was too hard on the cluster state?


(Harlin) #4

So when we do a GET on the synced flush it looks like there are some pending operations:

"failures": [
{"shard": 22,"reason": "pending operations","routing": {"state": "STARTED","primary": false,"node": "44XSwmYqS2e2kBEFc8O7RQ","relocating_node": null,"shard": 22,"index": "whois-current-v3"}},
{"shard": 39,"reason": "pending operations","routing": {"state": "STARTED","primary": false,"node": "qjQLhrV4QGW69gje_bbEgg","relocating_node": null,"shard": 39,"index": "index"}},
{"shard": 58,"reason": "pending operations","routing": {"state": "STARTED","primary": true,"node": "44XSwmYqS2e2kBEFc8O7RQ","relocating_node": null,"shard": 58,"index": "index"}},
{"shard": 28,"reason": "pending operations","routing": {"state": "STARTED","primary": false,"node": "6hYvIZ9PQ4ah9bgPeYjNOA","relocating_node": null,"shard": 28,"index": "index"}}]},

This is slightly confusing though since we haven't touched the cluster in terms of indexing for several days and there is no balancing/relocating happening


(Mark Walkom) #5

What version are you on?


(Harlin) #6

We are on ES 1.6.


(Nik Everett) #7

Looks like we're also seeing some pending operations when we don't think we have any.


(Nik Everett) #8

This is our beta cluster which has always been a bit weird.

And have some output format:

manybubbles@deployment-bastion:/srv/mediawiki$ curl -s deployment-elastic05:9200/_flush/synced | python -mjson.tool | tee synced_flush
{
    "_shards": {
        "failed": 25, 
        "successful": 740, 
        "total": 765
    }, 
    "commonswiki_content_first": {
        "failed": 0, 
        "successful": 3, 
        "total": 3
    }, 
    "commonswiki_file_first": {
        "failed": 9, 
        "failures": [
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "x0CoVTqaTTqj1IuoLIB0LQ", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 3, 
                    "state": "STARTED"
                }, 
                "shard": 3
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "RWFa4RqbQjG7xmP95l_b0Q", 
                    "primary": true, 
                    "relocating_node": null, 
                    "shard": 3, 
                    "state": "STARTED"
                }, 
                "shard": 3
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "-qPd7LtcQRaszihtRgytWA", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 1, 
                    "state": "STARTED"
                }, 
                "shard": 1
            }, 

            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "x0CoVTqaTTqj1IuoLIB0LQ", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 1, 
                    "state": "STARTED"
                }, 
                "shard": 1
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "Ro1ZcPd9SVetETtEgl8Ceg", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 1, 
                    "state": "STARTED"
                }, 
                "shard": 1
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "RWFa4RqbQjG7xmP95l_b0Q", 
                    "primary": true, 
                    "relocating_node": null, 
                    "shard": 1, 
                    "state": "STARTED"
                }, 
                "shard": 1
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "-qPd7LtcQRaszihtRgytWA", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 6, 
                    "state": "STARTED"
                }, 
                "shard": 6
            }, 

            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "RWFa4RqbQjG7xmP95l_b0Q", 
                    "primary": true, 
                    "relocating_node": null, 
                    "shard": 6, 
                    "state": "STARTED"
                }, 
                "shard": 6
            }, 
            {
                "reason": "pending operations", 
                "routing": {
                    "index": "commonswiki_file_first", 
                    "node": "Ro1ZcPd9SVetETtEgl8Ceg", 
                    "primary": false, 
                    "relocating_node": null, 
                    "shard": 6, 
                    "state": "STARTED"
                }, 
                "shard": 6
            }
        ], 
        "successful": 19, 
        "total": 28
    }, 
...

We've totally isolated the cluster:

manybubbles@deployment-bastion:/srv/mediawiki$ curl 'deployment-elastic05:9200/_cat/thread_pool?v&h=host,optimize.active,merge.active,warmer.active,suggest.active,snapshot.active,search.active,refresh.active,percolate.active,management.active,index.active,get.active,generic.active,flush.active,bulk.active'
host                 optimize.active merge.active warmer.active suggest.active snapshot.active search.active refresh.active percolate.active management.active index.active get.active generic.active flush.active bulk.active 
deployment-elastic05               0            0             0              0               0             0              0                0                 1            0          0              0            0           0 
deployment-elastic08               0            0             0              0               0             0              0                0                 1            0          0              0            0           0 
deployment-elastic06               0            0             0              0               0             0              0                0                 1            0          0              0            0           0 
deployment-elastic07               0            0             0              0               0             0              0                0                 1            0          0              0            0           0 

But maybe I have a hint:

watch curl -s \''deployment-elastic05:9200/_cat/thread_pool?v&h=host,management.completed'\'
Every 2.0s: curl -s 'deployment-elastic05:9200/_cat/thread_pool?v&h=host,management.completed'                                                                                       Wed Jul  8 15:49:46 2015

host                 management.completed
deployment-elastic05              2519025   <--- These number keep going up
deployment-elastic08              2479017
deployment-elastic06              2506622
deployment-elastic07              2517234

(Harlin) #9

Yes! This is identical to what happened to us. The cluster was completely inactive yet was still claiming to have pending operations. No idea what is causing this.


(Nik Everett) #10

I will know. Soon.


(Nik Everett) #11

Well, not that soon. I've gotten pretty deep and I'm still not sure. Turning on index logging is useful but it doesn't tell me why the IndexWriter thinks there are pending actions. I'll keep looking. Still not sure if its actually a bug or what.


(Harlin) #12

Yeah, I'm starting to lean towards this being a possible bug. Thanks for digging into this!


(Nik Everett) #13

I'll look again in the morning.


Issue with "Synced Flush" for Elasticsearch (ES) V 1.5.2
(Simon Willnauer) #14

hey guys,

I think I know what's going on here... lemme explain a bit how this works:

when a synced flush is scheduled or triggered via a get we do:

  1. flush the engine if its needed (this is determined by a boolean outside called "flushNeeded" of IndexWriter in 1.x)
  2. do the synced flush where we check the indexwriter if there are "pending_changes" (using "IndexWriter#hasUncommittedChanges()"

What can happen if you stop indexing is that we flush the documents and set "flushNeeded = false" now a merge kicks off that caused "IndexWriter#hasUncommittedChanges()" to return true from now on. (note "flushNeeded" is still false since it's maintained outside of indexwriter)

Now you kick off a synced flush and it will not "clean" the indexwriter change count since "flushNeeded" but the second step of synced flush fails since it has not been flushed. I totally see this being a bug though. Yet, I'd like to figure out if that is actually the case so if you send a force flush and then a sync flush it should not give you these errors? Can you run this:

POST /${your_index}/_flush?force=true&wait_if_ongoing=true

And after that:

POST  /${your_index}/_flush/synced

And paste the responses here?


(Nik Everett) #15

Yup. That did it:

manybubbles@deployment-bastion:~$ curl -s deployment-elastic05:9200/_flush/synced?pretty | tee synced_flush_pre
{
  "_shards" : {
    "total" : 765,
    "successful" : 740,
    "failed" : 25
  },
...
# OK problem still exists
manybubbles@deployment-bastion:~$ curl -s 'deployment-elastic05:9200/_flush?pretty&force&wait_if_ongoing' | tee flush_forced
{
  "error" : "ClusterBlockException[blocked by: [FORBIDDEN/5/index read-only (api)];]",
  "status" : 403
}
manybubbles@deployment-bastion:~$ curl -XPUT deployment-elastic05:9200/apifeatureusage-2015.07.08/_settings?pretty -d '{
>   "index.blocks.read_only" : false
> }'
{
  "acknowledged" : true
}
manybubbles@deployment-bastion:~$ curl -s 'deployment-elastic05:9200/_flush?pretty&force&wait_if_ongoing' | tee flush_forced
{
  "_shards" : {
    "total" : 765,
    "successful" : 765,
    "failed" : 0
  }
}
manybubbles@deployment-bastion:~$ head synced_flush_post 
{
  "_shards" : {
    "total" : 765,
    "successful" : 765,
    "failed" : 0
  },
...
# That did it!

(Nik Everett) #16

Victory! I just did a rolling restart in our beta environment with synced flush. It seemed to work. With this workaround we can totally upgrade production! Hurray. Only one more slow restart!


(Simon Willnauer) #17

awesome! thanks for reporting back. It will be fixed in the soon coming 1.6.1 release


(system) #18