1.7 Billion Document Migration - Circuit Breaking Exception

kurtiskurtis · March 19, 2018, 3:28pm

Setup:
----All machines hosted on AWS EC2----
3 Dedicated Masters (15 Gb Ram)
3 D/I Nodes (64 Gb Ram 30 Dedicated to Heap)
1 Network Load Balancer

Scenario:

We are moving our Elastic Search cluster from a hosted service to an internally managed cluster. In doing so we decided to re-architect some of our indexes into "per customer indexes" we now have approx. 500 indexes with 4 shards and 1 replica for each. I have been migrating our data from the old cluster into the new one, and I have managed to successfully migrate 1.2 billion of the 1.7 billion documents. Up until this point there were only minor issues that were easily resolved by simple script refactoring.

Issue:

I cannot index any more documents without getting a "Circuit Breaking Exception". I have read all the documentation around Circuit breaking exceptions, but have not found a solution as of yet. I have set the field data cache size to 50% with the breaker limit set to 60% and the total limit set to 70%. The problem persists.

Maybe Helpful information:

I ran GET _stats/_all before running the script and during the script (before it crashes). The text files are too large to put in here so I uploaded them to google drive. This may provide some valuable insight?
https://drive.google.com/open?id=11gTj3F_A24_6MgwbtZvL1GLjfhIseYwl
I Included the _all section and the index at this stage in the migration
The cluster is not currently in use, it is only being written to with the occasional query to check the status of the migration.
The circuit breaking exception stats that the data would be 20.9 gb which exceeds the limit of 20.9 gb. since 20.9 gb is approx. 70% of the available heap space, I believe it is the parent circuit breaker that is tripping as it defaults to 70% of heap
(for those of you familiar with python bulk api): This is the bulk helper I have used to insert the first 1.2 billion without failure
helpers.bulk(es, generator(account_id), chunk_size=100000, max_retries=3)
I have tried reducing chunk size and that did not solve it.

Exception:

`[2018-03-15T00:23:54,871][WARN ][o.e.a.b.TransportShardBulkAction] [es-data-1] [[583cac778e80276912b44300-breadcrumbs_v1][2]] failed to perform indices:data/write/bulk[s] on replica [583cac778e80276912b$
org.elasticsearch.transport.RemoteTransportException: [es-data-2][172.31.3.107:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [22500182661/20.9gb], which is larger than the limit of [2249976709...

kurtiskurtis · March 19, 2018, 3:29pm

If i have left anything out or you require any more information at all please let me know? I have been working on this project for a week now, and I would really like to finish this migration

kurtiskurtis · March 19, 2018, 7:58pm

Shameless self bump. If I have left out any information that would assist anyone please let me know. I am desperate to solve this problem

dadoonet · March 19, 2018, 8:20pm

Read this and specifically the "Also be patient" part.

kurtiskurtis · March 19, 2018, 8:40pm

Note this isnt intended to offend or be rude, just justifying my actions:

While I wouldn't normally reply in a passive aggressive manor, I feel that you may want to overlook the guidelines you provided since it specifically says that a reminder ping is welcome.

dadoonet · March 19, 2018, 8:53pm

Sure. It's fine after 2 or 3 days (not including weekends) but not after 5h IMO.

kurtiskurtis · March 19, 2018, 8:57pm

Ok, I will take your 2-3 day rule of thumb into consideration next time. Thanks for everything!

system · April 16, 2018, 8:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to get rid of circuit breaking exception Elasticsearch	10	2056	September 27, 2019
CircuitBreaker: [parent] Data too large, data for [<transport_request>] Elasticsearch	2	1676	September 5, 2019
CircuitBreakingException: [parent] Data too large is coming in ES (7.2.0) Elasticsearch	13	2055	November 22, 2019
CircuitBreakingException[[parent] Data too large on upgrading to elasticsearch 7.7 from 5.16 Elasticsearch	4	497	January 7, 2021
CircuitBreakingException: [parent] Data too large, data for [<transport_request>] Elasticsearch	7	23770	September 5, 2018

1.7 Billion Document Migration - Circuit Breaking Exception

Related topics