900000ms timeout during shard recovery


(John Ouellette) #1

Hi -- I am having significant problems with my small ES cluster. I'll accept that the problems are likely with my set up and not ES :slight_smile: but I am on the verge of giving up.... :frowning:

One problem I am seeing is that every time I restart a node, or ES on a node crashes and is restarted, shard recovery takes days -- if it ever completes. I have had to wipe all of the data on my cluster and backfill several times when recovery doesn't complete. The shards that do not recover tend to be large: over 30GB. The error I am currently seeing in my logs suggests that there is a 900000ms recovery timeout which is preventing these shards from being transferred to the recovering node. (See error message below.)

My cluster is composed of two data nodes (96GB RAM, 24 cores) and three master nodes (no data, 64GB RAM, 16 cores) which are also used for logstash indexing. I am running ES 2.0. Each ES instance has 30GB allocated to it (this was reduced from something much larger after having read that Java does not handle heap allocations >32GB gracefully). Each index is usually configured to have one shard and one replica.

  • what is the 900s timeout I am hitting?
  • is it configurable, or should it be left alone and I should change something else with my cluster?
  • are these indexes/shards unusual in size?

Caused by: RemoteTransportException[[muninn][132.246.195.227:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [190] files with total size of [95gb]]; nested: ReceiveTimeoutTransportException[[huginn][132.246.195.226:9300][internal:index/shard/recovery/prepare_translog] request_id [23511214] timed out after [900000ms]];
Caused by: [tomcat-svc-2015.02][[tomcat-svc-2015.02][0]] RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [190] files with total size of [95gb]]; nested: ReceiveTimeoutTransportException[[huginn][132.246.195.226:9300][internal:index/shard/recovery/prepare_translog] request_id [23511214] timed out after [900000ms]];
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:127)
at org.elasticsearch.indices.recovery.RecoverySource.access$200(RecoverySource.java:53)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:136)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:133)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


(Christian Dahlqvist) #2

How much data do you have in the cluster? How many indices and shards do you have in the cluster? What is the size of the largest shard?


(John Ouellette) #3

Hi Christian -- Currently the cluster has 109 shards and 42 indices. Most indices are configured to have one shard and one replica. Most of my indices are dedicated to logging for two applications, so I have two main _types and each index only gets one type. A new index is started at the beginning of the month, and I want to keep most data for about a year. So, although I have 42 indices, most are split between these two types, and the most active index is (naturally) the one for the current (and previous) month. The largest shard is 113.9GB. Any shard larger than around 30GB has the problem I described in my previous message. The total amount of data in the cluster is about 1TB.

Thanks for any help you can offer!
John


(Christian Dahlqvist) #4

We generally recommend keeping shard smaller than 50GB, and that is primarily in order to make sure shards does not cause problems with recovery. By default recovery is throttled quite a bit in order to not overwhelm the cluster, but you can override these settings in and make if more aggressive.


(John Ouellette) #5

Thanks Christian -- that is interesting to know! Most of what I have read in the ES documentation focuses on the number of shards, not the size: in fact, I don't think I've ever read anything in the ES docs saying that there was a practical size to the ES shards. A lot of the documentation talks about how having too many shards is not a good thing: the flip side of that is that, if you're not going to have a lot of shards, each shard will tend to be larger...

It makes sense that there is a practical size limit to the number of shards: this must be a function of the amount of resources that you can dedicate to the various functions in ES (recovery, search, etc.) and the amount of RAM left on the host for file caching. There are so many knobs and levers in ES, it is hard to know when you're hitting a limit like that and when you just need to fiddle with those settings :slight_smile:

So, I could try to split my indexes into more shards -- which will require reindexing -- or create more indexes -- which will also require reindexing -- but the ES docs are very explicit in saying that these are equivalent. Through past experimentation, I've found that too many shards in a cluster, or too many indexes, is a recipe for failure, too: I had to rebuild my cluster a few times before I learned this....

I have two main indexes -- one for each of two main applications that I am capturing the logs of. The rate of indexing in the most active is, typically, about 100-200 events/s, with occasional peak rates much higher than that. Each of these indexes currently has one shard and one replica, and the index will receive data for a calendar month at the end of which a new index is created and data stops going into the old index and starts going into the new. My searches are mostly manual, through Kibana, at this point, although I am writing queries to run automatically and frequently: these searches will be focusing mostly on the last hour's worth of data, or maybe day's worth of data. Most of my manual searches focus on the last few week's worth of data, although I need to search almost as frequently on the last few month's or the last year's data.

From my usage, I would say my cluster needs to be more optimized for indexing and replication than for search. Given that I currently have the one index per month, and one shard per index, and that my cluster is composed of two data-only nodes and three master-only nodes (which are also used for search), how could I best configure my cluster and indexing?

Thanks!
John


(John Ouellette) #6

Hi Christian -- I have been looking at the settings you mentioned, and have tried experimenting a bit. I also started watching the _recovery endpoint on one of the indexes as it was being recovered. For the one index I was monitoring, I watched the 'recovered_in_bytes' value grow until it was 100% complete, and the stage of the recovery go into VERIFY_INDEX. This eventually finished and, as far as I could tell, the index was up again. A short while later, I started receiving the error below, and the recovery started all over again. Any idea why this could be happening? There are no hardware errors on the system, no memory problems, no restarts.

The interesting bit in the log is the IndexNotFoundException -- the index is definitely there...

[2015-12-14 16:37:29,043][DEBUG][action.admin.indices.stats] [muninn] [indices:monitor/stats] failed to execute operation for shard [[tomcat-svc-2015.04][0], node[xWEfHZ5qQc-WtRMozNF_dQ], [R], v[1394], s[INITIALIZING], a[id=KM7K9HW9Qt28vFqt
HHGKxw], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-12-15T00:37:00.910Z], details[failed to create shard, failure ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [tomcat-svc-2015.0
4][0], timed out after 5000ms]; ]]]
[tomcat-svc-2015.04][[tomcat-svc-2015.04][0]] BroadcastShardOperationFailedException[operation indices:monitor/stats failed]; nested: IndexNotFoundException[no such index];
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:399)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:376)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:365)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: [tomcat-svc-2015.04] IndexNotFoundException[no such index]
at org.elasticsearch.indices.IndicesService.indexServiceSafe(IndicesService.java:295)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:97)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:395)
... 7 more


(system) #7