Data ingestion stops when re-opening closed indices

hh_marshall · June 27, 2015, 3:56am

Hello all,
There were several variables in play for this scenario. My impression is that shard (re)allocation can be expensive. Given those 2 data points, I will try to present a succinct description so that if the answer is "don't do that", I have sacrificed few electrons and few (of your) brain cells.

We wanted to expand our system and upgrade from 1.4.4 to 1.6. Our system is

6 logstash boxen (using node protocol)
5 es data nodes (32G RAM/1.6TB)
1 es no-data no-master (http frontend)
5 shard/1 replica
7 daily indices (so 70 shards per day)/ ~30 days retention
5.5 TB total storage used

We started by adding four 1.6 nodes to the cluster. Things went ok with some strange unbalanced shard distribution and periods of no data ingestion overnite. We closed a majority of the indices then moved the original nodes from 1.4.4. to 1.6.

When we started re-opening indices (several at once), es node load went thru the roof and ingestion of data stopped. Data ingestion would not resume until all shards were allocated, initialized, and the subsequent re-allocation activity was complete. We were using kopf as the indicator of these operations.

We used the kopf "cluster settings" page to manipulate parameters:

routing: concurrent_rebalance, concurrent_recoveries
recovery: concurrent_streams, max_bytes_per_sec

At the end as were using:

concurrent_rebalance=1, concurrent_recoveries=10
concurrent_streams=3, max_bytes_per_sec=200M

... and the cluster just would NOT ingest data until all the shard processing was complete.

I have seen performance degradation around addition/deletion of nodes, but this just seemed SO finicky, unexpected, and mildly concerning. We just were not aware of this level of impact.

I have tried to balance info with brevity. I'm hoping that its best to provide any add'l info in response to questions or hits from more informed users.

thx so much. and thx to the elastic team: awesome stuff!

  hunter

Topic		Replies	Views
Elasticsearch stops ingesting data when node is down Elasticsearch	4	1549	June 11, 2018
Ingestion have been stoped at 70%, would my ingested data will get lost when I reingest? Elasticsearch	15	340	May 12, 2021
Elasticsearch cluster design for heavy ingestion Elasticsearch	1	320	April 10, 2020
Data loss with 0.19.8 Elasticsearch	3	636	July 6, 2017
Rebalancing and Spark Ingest Slow Down Elasticsearch es-hadoop	2	1227	July 6, 2017

Data ingestion stops when re-opening closed indices

Related topics