When we load data from hadoop into elasticsearch, we keep seeing errors in the tasks like this:
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries [99/347072] (maybe ES was overloaded?). Bailing out...
Since our hadoop cluster can load/read data at an enormous rate i am not surprised our (much smaller) elasticsearch cluster can not keep up. Fair enough. So this question is not about optimizing elasticsearch for faster indexing.
My question is: why can elasticsearch not do some kind of pushback to slow down the hadoop job to a speed that is acceptable for elasticsearch? It seems elasticsearch will happily keep on ingesting data at a rate it simply cannot sustain...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.