Throttling write speed in ES-Hadoop Connector

(Kamaldeep Singh) #1


I was working on ES - hadoop connector and I see that if you server has less memory writes keep on getting dropped.
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries (maybe ES was overloaded?). Bailing out...

As mentioned in Pushback to hadoop from es on bulk load there's no bi-directional communication between Hadoop and the connector - the connector cannot say, there's too much data, slow down.

Does anyone think it might be a good idea to use sth. like Blocking Queues here and add acks while writing (kafka 101) :slight_smile: so as to let the consumer (thread on ES) read at slower pace.

Else we would have to tune the batch size, write speed, http timeouts ourself.

I'm open to building/contributing to this its a good idea.

Pushback to hadoop from es on bulk load
(Zhifeng Ma Beijing) #2

I like the idea!

(system) #3