This appears to be a cURL issue more than an ES issue, but I couldn't find any discussion of it on http://curl.haxx.se, and it seemed likely that some other ES user has hit the same roadblock.
I frequently use _bulk indexing via the cURL client on Linux + Ubuntu:
curl -XPUT 'http://localhost:9200/_bulk' -d '{"index": [......etc. etc.]
Working with set of 150 large documents recently, I discovered that if I sent <= 35 documents at a time, then every document was indexed fine. If I sent >35 documents, the percentage of documents successfully indexed declined rapidly. The reason was not a failure of ES to index individual documents within a request (there were no ES errors or anything); instead, it appeared that cURL just started choking on entire requests. The number of cURL failures increased with the size of the query parameter of my request.
Based on these observations, I am speculating that cURL has a limit on the size of paramaters it can handle. Alternatively, this size limit exists somewhere in the operating system.
Further testing shows that the threshold appears to be around 130 kilobytes. Bulk posts >= 132 kilobytes (I'm including the full command in my byte count) fail, while bulk posts <= 128 kilobytes succeed.
For final confirmation, I just now ran the commands interactively and got a shell error that said:
-bash: /usr/bin/curl: Argument list too long
Has anyone encountered this barrier? Is it over-rideable, or do I need to limit the size of my bulk requests?
Thanks in advance.