----All machines hosted on AWS EC2----
3 Dedicated Masters (15 Gb Ram)
3 D/I Nodes (64 Gb Ram 30 Dedicated to Heap)
1 Network Load Balancer
We are moving our Elastic Search cluster from a hosted service to an internally managed cluster. In doing so we decided to re-architect some of our indexes into "per customer indexes" we now have approx. 500 indexes with 4 shards and 1 replica for each. I have been migrating our data from the old cluster into the new one, and I have managed to successfully migrate 1.2 billion of the 1.7 billion documents. Up until this point there were only minor issues that were easily resolved by simple script refactoring.
I cannot index any more documents without getting a "Circuit Breaking Exception". I have read all the documentation around Circuit breaking exceptions, but have not found a solution as of yet. I have set the field data cache size to 50% with the breaker limit set to 60% and the total limit set to 70%. The problem persists.
Maybe Helpful information:
GET _stats/_allbefore running the script and during the script (before it crashes). The text files are too large to put in here so I uploaded them to google drive. This may provide some valuable insight?
I Included the
_allsection and the
indexat this stage in the migration
The cluster is not currently in use, it is only being written to with the occasional query to check the status of the migration.
The circuit breaking exception stats that the data would be 20.9 gb which exceeds the limit of 20.9 gb. since 20.9 gb is approx. 70% of the available heap space, I believe it is the parent circuit breaker that is tripping as it defaults to 70% of heap
(for those of you familiar with python bulk api): This is the bulk helper I have used to insert the first 1.2 billion without failure
helpers.bulk(es, generator(account_id), chunk_size=100000, max_retries=3)
I have tried reducing chunk size and that did not solve it.
`[2018-03-15T00:23:54,871][WARN ][o.e.a.b.TransportShardBulkAction] [es-data-1] [[583cac778e80276912b44300-breadcrumbs_v1]] failed to perform indices:data/write/bulk[s] on replica [583cac778e80276912b$ org.elasticsearch.transport.RemoteTransportException: [es-data-2][172.31.3.107:9300][indices:data/write/bulk[s][r]] Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [22500182661/20.9gb], which is larger than the limit of [2249976709...