Curl argument size limit?


(searchersteve) #1

This appears to be a cURL issue more than an ES issue, but I couldn't find any discussion of it on http://curl.haxx.se, and it seemed likely that some other ES user has hit the same roadblock.

I frequently use _bulk indexing via the cURL client on Linux + Ubuntu:

curl -XPUT 'http://localhost:9200/_bulk' -d '{"index": [......etc. etc.]

Working with set of 150 large documents recently, I discovered that if I sent <= 35 documents at a time, then every document was indexed fine. If I sent >35 documents, the percentage of documents successfully indexed declined rapidly. The reason was not a failure of ES to index individual documents within a request (there were no ES errors or anything); instead, it appeared that cURL just started choking on entire requests. The number of cURL failures increased with the size of the query parameter of my request.

Based on these observations, I am speculating that cURL has a limit on the size of paramaters it can handle. Alternatively, this size limit exists somewhere in the operating system.

Further testing shows that the threshold appears to be around 130 kilobytes. Bulk posts >= 132 kilobytes (I'm including the full command in my byte count) fail, while bulk posts <= 128 kilobytes succeed.

For final confirmation, I just now ran the commands interactively and got a shell error that said:

-bash: /usr/bin/curl: Argument list too long

Has anyone encountered this barrier? Is it over-rideable, or do I need to limit the size of my bulk requests?

Thanks in advance.


(searchersteve) #2

Okay, I've got some answers, but would love some additional input....

This is a Linux shell issue. See this well-researched overview of Linux argument limits, or this oblique reference in the Linux manual.

The Linux constant MAX_ARG_STRLEN restricts the size of any given argument (e.g., the ElasticSearch JSON payload) to 131072 bytes.

Because MAX_ARG_STRLEN is an immutable constant, the only proposed solutions out there are: a) break up the argument and loop over it in the shell; b) load the (url-encoded) JSON from a file, denoted by @[filename] in place of the JSON. Solution A kind of defeats the whole purpose of bulk indexing in ES, so I'm about to embark on B.

Can anyone steer me in a different direction?


(Daniel Ferreira) #3

I, you probably don't need use curl to upload your bulk.
you can save your json struct in a file and upload it.
example:
wget --post-file ./huge_struct_data http://localhost:9200/_bulk


(system) #4