Occasional Bulk Insert Failure (ES 2.4.4)

Hi all!

In an application, I fetch rows from multiple tables over distinct JDBC connections in parallel, transform rows into document fields, and send it to ES in batches of size 1000 using the Java Bulk API. There I set the timeout of the bulk inserts to 15s and repeat at most 3 times on failure. The entire process takes >6 hours with max. 6 fetches in parallel on an ES cluster of 3 beefy VMs. (1 data, 1 master node on each VM.) But occasionally some bulk inserts fail even after retries. How should I diagnose and tackle this problem? (For the records, index is created using indices.store.throttle.type=none, number_of_shards=6, number_of_replicas=0, index.refresh_interval=-1, and translog.disable_flush=true. Upon successful completion, we revert these to production settings.)

Best.

Hey,

can you provide more information while the bulk inserts failed? Did they fail because of a server or client issue? Can you provide the responses?

--Alex

Hey Alex!

Sorry for the misunderstanding. By "fail", I do mean that my wrapper Hystrix command timeouts after 15s. I even tried increasing timeout threshold to 30s. Even then, after a certain amount of inserts, occasionally some inserts just keep on waiting.

Best.

Hey,

have you checked your Elasticsearch logs during that time? Is there a node doing garbage collection maybe? You will find that in the logs.

--Alex

I think we found the culprit: translog flushes. Although I set translog.disable_flush=true, apparently ES still prefers to do some:

See the idle state in the translog size? That's where the entire ES cluster gets busy with flushing the translog, which in the meantime causes >2m delays in our bulk inserts. Is translog.disable_flush=true not doing why I do expect it to do, or am I misinterpreting its function?

Hey,

is there any reason you decided to disable the flushing of the translog in the first place? This option was removed (and only useful in tests anyway).

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.