Import large number of json documents failing via bulk import with no error message

I am trying to import documents stored in a json file (1.4GB) via the bulk api with the following command:

 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/{index}/_bulk --data-binary "@<filename>.json"

With small json documents (< 100MB) it works as expected, but with larger files such as the one mentioned above, it just exits with no message at all. No info in /var/log/elasticsearch/cluster.log at all.

My current cluster (mesh VPN) specs:

Master eligible node: 32GB RAM, 1TB HDD, 4 cores (jvm: Xms16g, Xmx16g)
Data node 1: 32GB RAM, 1TB HDD, 4 cores (jvm: Xms16g, Xmx16g)
Data node 2: 32GB RAM, 1TB HDD, 4 cores (jvm: Xms16g, Xmx16g)

Structure of the json documents are:

"
{"index": {"_index": "<index-title>", "_type": "<type-title>, "_id": "<document-id>"}}
{"attr01": val01, "attr02: "attr02", [...]}
<repeating for all documents>

"

What could be the issue?

Where can I find log information about the failing import?

hi, @BinaryIsPrimary
you should increase parameter: "http.max_content_length" in elasticsearch.yml and restart elasticsearch to needed value, for example

http.max_content_length: "500mb"

by default, it equal to 100mb.
here documentation

1 Like

The bulk index API is designed to efficiently index groups of documents in a single request. It is however not designed to necessarily index all documents in one request, and it is generally recommend to keep the size of each bulk request around 5MB or so. I would therefore recommend breaking your file up into multiple smaller ones rather than try increasing the http.max_content_length parameter, which can put a lot of load on the cluster.

1 Like

That makes sense, I will split them up then. Thank you.

@nugusbayevkk Thanks for your advice, too. I increased the value, but the result is still the same. I will follow @Christian_Dahlqvist's advice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.