Curl argument size limit?

searchersteve · October 14, 2011, 5:33pm

This appears to be a cURL issue more than an ES issue, but I couldn't find any discussion of it on http://curl.haxx.se, and it seemed likely that some other ES user has hit the same roadblock.

I frequently use _bulk indexing via the cURL client on Linux + Ubuntu:

curl -XPUT 'http://localhost:9200/_bulk' -d '{"index": [......etc. etc.]

Working with set of 150 large documents recently, I discovered that if I sent <= 35 documents at a time, then every document was indexed fine. If I sent >35 documents, the percentage of documents successfully indexed declined rapidly. The reason was not a failure of ES to index individual documents within a request (there were no ES errors or anything); instead, it appeared that cURL just started choking on entire requests. The number of cURL failures increased with the size of the query parameter of my request.

Based on these observations, I am speculating that cURL has a limit on the size of paramaters it can handle. Alternatively, this size limit exists somewhere in the operating system.

Further testing shows that the threshold appears to be around 130 kilobytes. Bulk posts >= 132 kilobytes (I'm including the full command in my byte count) fail, while bulk posts <= 128 kilobytes succeed.

For final confirmation, I just now ran the commands interactively and got a shell error that said:

-bash: /usr/bin/curl: Argument list too long

Has anyone encountered this barrier? Is it over-rideable, or do I need to limit the size of my bulk requests?

Thanks in advance.

searchersteve · October 14, 2011, 11:31pm

Okay, I've got some answers, but would love some additional input....

This is a Linux shell issue. See this well-researched overview of Linux argument limits, or this oblique reference in the Linux manual.

The Linux constant MAX_ARG_STRLEN restricts the size of any given argument (e.g., the ElasticSearch JSON payload) to 131072 bytes.

Because MAX_ARG_STRLEN is an immutable constant, the only proposed solutions out there are: a) break up the argument and loop over it in the shell; b) load the (url-encoded) JSON from a file, denoted by @[filename] in place of the JSON. Solution A kind of defeats the whole purpose of bulk indexing in ES, so I'm about to embark on B.

Can anyone steer me in a different direction?

Daniel_Ferreira · October 15, 2011, 12:15am

I, you probably don't need use curl to upload your bulk.
you can save your json struct in a file and upload it.
example:
wget --post-file ./huge_struct_data http://localhost:9200/_bulk

Topic		Replies	Views
TooLongFrameException: HTTP content length exceeded 104857600 bytes Elasticsearch	2	3165	July 6, 2017
Sending data to elasticsearch with json file using CURL, meet records number limitation Elasticsearch	3	3006	December 22, 2017
Problem bulk indexing Elasticsearch	7	694	July 6, 2017
Bulk load on a large file return error 56 from curl Elasticsearch	2	1163	July 6, 2017
Elasticsearch : cannot bulk index file larger than 6mb Elasticsearch	1	590	August 21, 2017

Curl argument size limit?

Related topics