We've been running a Hadoop/Hive (Apache 2.7.0) -> ES (1.5) integration for a while and things have worked reasonable well. One thing we've had to keep in mind when setting up scheduled batch jobs has been not to "overwhelm" the ES-cluster with the data output from Hive, but still, pushing some 80 million documents in one job has not been a problem.. until now.
We upgraded to ES 2.0 (including the latest jars in hadoop) yesterday and several of our batch jobs that use to run fine are now failing. The ones failing all do so for the same reason:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [x.x.x.x:9200] returned Too Many Requests(429) - rejected execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1@3c46fe20 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@593dcd36[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 626670]]; Bailing out..
I've fiddled some with the batch settings such as es.batch.size.bytes, and es.batch.size.entries, and while I do experience some difference in behaviour, the problem definitely does not vanish.
I do know that I could increase the queue on the ES side, but I also know from experience that increasing the queue can lead to other problems (out of heap etc), so I would like I pointer here.. is the way to go to try to throttle the output from hadoop, and if so, how? Or should I focus on ES being able to handle more requests? (please note that there is no rush getting the data in there).