Facing error when trying to reindex from remote

pradla · November 26, 2018, 5:41am

I am trying to reindex from remote and I am facing an error. This was running fine for a few days now but suddenly I am facing this error

my reindex code:

curl -X POST "127.0.0.1:9201/_reindex?pretty=true" -H 'Content-Type: application/json' -d'
{
"conflicts": "proceed",
"source": {
"remote": {
"host": "http://localhost:9200"
},
"index": "my_new_twitter_river"
},
"dest": {
"index": "twittertool2"
}
}'

Error I am getting:

{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[parent] Data too large, data for [<http_request>] would be [7430139381/6.9gb], which is larger than the limit of [7345215897/6.8gb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=190/190b, accounting=7430139191/6.9gb]",
"bytes_wanted" : 7430139381,
"bytes_limit" : 7345215897
}
],
"type" : "circuit_breaking_exception",
"reason" : "[parent] Data too large, data for [<http_request>] would be [7430139381/6.9gb], which is larger than the limit of [7345215897/6.8gb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=190/190b, accounting=7430139191/6.9gb]",
"bytes_wanted" : 7430139381,
"bytes_limit" : 7345215897
},
"status" : 503
}

after some search I tried to increase the limit with this

curl -XPUT localhost:9201/_cluster/settings -H 'Content-Type: application/json' -d '{
"persistent" : {
"indices.breaker.fielddata.limit" : "75%"
}
}'

I got this acknowledgement:

{"acknowledged":true,"persistent":{"indices":{"breaker":{"fielddata":{"limit":"75%"}}}},"transient":{}}

but the error is not resolved.

Does anyone has any insights on how to solve this?

warkolm · November 26, 2018, 6:23am

It's probably referring to http.max_content_length, https://www.elastic.co/guide/en/elasticsearch/reference/6.5/modules-http.html

But be careful increasing it too large as it may cause an OOM.

Christian_Dahlqvist · November 26, 2018, 6:33am

It sounds like you are suffering from heap pressure. You may need to scale up of out in order to add resources to the cluster or try to optimise your heap usage. What is the full output from the cluster stats API?

You should also be careful changing circuit breaker settings as pushing this too far can cause serious problems.

pradla · November 26, 2018, 7:24am

Output from cluster stats api

{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "twittertool-2.0",
"cluster_uuid" : "KUcPiIQqT_2OtgOfvxcCsw",
"timestamp" : 1543216908848,
"status" : "yellow",
"indices" : {
"count" : 2,
"shards" : {
"total" : 10,
"primaries" : 10,
"replication" : 0.0,
"index" : {
"shards" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"primaries" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"replication" : {
"min" : 0.0,
"max" : 0.0,
"avg" : 0.0
}
}
},
"docs" : {
"count" : 1212735146,
"deleted" : 260172548
},
"store" : {
"size" : "1.9tb",
"size_in_bytes" : 2150310164567
},
"fielddata" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 762,
"memory" : "6.9gb",
"memory_in_bytes" : 7430139191,
"terms_memory" : "6.5gb",
"terms_memory_in_bytes" : 7045584686,
"stored_fields_memory" : "276.2mb",
"stored_fields_memory_in_bytes" : 289619488,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "3.6mb",
"norms_memory_in_bytes" : 3847680,
"points_memory" : "73.6mb",
"points_memory_in_bytes" : 77275865,
"doc_values_memory" : "13.1mb",
"doc_values_memory_in_bytes" : 13811472,
"index_writer_memory" : "0b",
"index_writer_memory_in_bytes" : 0,
"version_map_memory" : "0b",
"version_map_memory_in_bytes" : 0,
"fixed_bit_set" : "0b",
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : 1542395973142,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 1,
"data" : 1,
"coordinating_only" : 0,
"master" : 1,
"ingest" : 1
},
"versions" : [
"6.5.0"
],
"os" : {
"available_processors" : 40,
"allocated_processors" : 40,
"names" : [
{
"name" : "Linux",
"count" : 1
}
],
"mem" : {
"total" : "125gb",
"total_in_bytes" : 134291501056,
"free" : "21gb",
"free_in_bytes" : 22551023616,
"used" : "104gb",
"used_in_bytes" : 111740477440,
"free_percent" : 17,
"used_percent" : 83
}
},
"process" : {
"cpu" : {
"percent" : 1
},
"open_file_descriptors" : {
"min" : 1032,
"max" : 1032,
"avg" : 1032
}
},
"jvm" : {
"max_uptime" : "1.3h",
"max_uptime_in_millis" : 5038641,
"versions" : [
{
"version" : "1.8.0_111",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "25.111-b15",
"vm_vendor" : "Oracle Corporation",
"count" : 1
}
],
"mem" : {
"heap_used" : "9.4gb",
"heap_used_in_bytes" : 10097095392,
"heap_max" : "9.7gb",
"heap_max_in_bytes" : 10493165568
},
"threads" : 79
},
"fs" : {
"total" : "43.5tb",
"total_in_bytes" : 47897179586560,
"free" : "39.6tb",
"free_in_bytes" : 43624962523136,
"available" : "39.6tb",
"available_in_bytes" : 43624962523136
},
"plugins" : ,
"network_types" : {
"transport_types" : {
"security4" : 1
},
"http_types" : {
"security4" : 1
}
}
}
}

Christian_Dahlqvist · November 26, 2018, 7:38am

It looks like you have very large shards (190GB) and that terms memory is taking up 6.5GB of heap space. I would recommend that you increase the heap size or scale out the cluster.

pradla · November 26, 2018, 7:48am

Thank you!

I was kind of worried that I might end up with very huge shards but realized I have to fix them at the time of starting only! Is there a way to increase the number of shards to scale out given that now I have 2 TB of data?

Christian_Dahlqvist · November 26, 2018, 7:57am

You do have the split index API, but this requires your index to have been created with the number_of_routing_shards parameter set, which is currently not enabled by default. It may therefore require reindexing.

pradla · November 26, 2018, 8:03am

Thanks @Christian_Dahlqvist for now looks like increasing the heap size solved the problem and its working again!! Thanks a ton!

But I would like to have a more permanent solution... will check out the split index api.

Christian_Dahlqvist · November 26, 2018, 8:19am

What is your use case? Are you performing a lot of document updates or just adding and deleting?

pradla · November 26, 2018, 8:31am

I am reindexing from remote and copying about 1.5 B tweets from elasticsearch 1.4.4 to elasticsearch 6.5

system · December 24, 2018, 8:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CircuitBreakingException: [parent] Data too large, data for [<transport_request>] Elasticsearch	7	23760	September 5, 2018
Gettiing "circuit_breaking_exception" while performing _reindex on multiple indices. Indices sizes are in mb's Elasticsearch	1	374	February 20, 2020
Cause and how to avoid "Data too large <http_request>" exception Elasticsearch	5	9729	December 11, 2018
Circuit breaking EXception while using Reindex API Elasticsearch	4	772	October 23, 2019
Data too large circuit_breaking_exception Elasticsearch	3	4122	July 16, 2019

Facing error when trying to reindex from remote

Related topics