Indexing of large nested object failed in ES 9.1.4

Hello All,

When we perform indexing in ES 9.1.4 we have encounter issue.

Indexing error occurred: method [POST], host [ip:9200], URI [/_bulk?timeout=4m], status line [HTTP/1.1 413 Request Entity Too Large]

errors = org.elasticsearch.client.ResponseException: method [POST], host [IP:9200], URI [/_bulk?timeout=4m], status line [HTTP/1.1 413 Request Entity Too Large]

  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)

  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:317)

  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:292)

  at co.elastic.clients.transport.rest_client.RestClientHttpClient.performRequest(RestClientHttpClient.java:92)

  at co.elastic.clients.transport.ElasticsearchTransportBase.performRequest(ElasticsearchTransportBase.java:151)

  at co.elastic.clients.elasticsearch.ElasticsearchClient.bulk(ElasticsearchClient.java:550)

  at

Here we have nested object in document and it is too large. Same has been working in ES 7.17 but encountered error in ES 9.1.4. Is there any solution for this?

Thanks

You need to reduce the bulk size I guess.
How many documents are you sending within the bulk request?

We have send 200 documents in each bulk request and some of the document are 750MB due to large nested object. Here, we are not able to reduce the single document size due to business need. so, reducing bulk size also not work here since default size is 100MB from elasticsearch(http.max_content_length). Here we need alternative solution.

So I doubt this worked as this limit has been there for years.
Are you sure you sent 750mb of bulk request to ES 7?

What type of content are you sending? Is that BASE64 binary attachments?

When I have encountered similar issues in the past it has often been due to one of 2 causes:

  1. Use of large binary blobs like David pointed out
  2. A data model where each document uses a potentially deeply nested structure to represent some structure or part of a structure in a relational model. Sometimes this is combined with parent-child in an attempt to replicate relational concepts in Elasticsearch.

Having looked at your prior queries it sounds like you may suffer from point 2. Elasticsearch is not optimised for storing very large nested documents and using it this way will potentially cause a number of different types of performance problems, e.g:

  • Indexing these very large documents will be slow and consume a lot of resources. You may as you have seen hit limits that you may need to override, thereby kicking the can down the road, but you are potentially making the situation worse in the future. At some point you will no longer be able to increase the size and at that point correcting the structure will be more difficult.
  • As each nested sub-document is represented by a. separate document behind the scenes, updating very large nested documents is quite expensive and resulkts in a lot of overhead.
  • Very large and complex documents can also lead to uneven shards sizes and hotspots that cause performance problems.
  • Querying very large documents can also cause problems, both at server and client side as deserializing these documents can be quite slow.
  • If individual documents represent relational hierarchies (or similar), the largest ones tend to grow the fastest and be updated more frequently that smaller documents, thereby making the problem worse.

Trying to replicate relational concepts and structures in Elasticsearch using parent-child and nested documents is IMHO an anti-pattern and rarely the right thing to do. If this is what you are doing I would recommend rethinking how you are structuring data instead of trying to apply a quick band-aid to ease the pain.