I know that there is a bulk API in ES which can load data from a file, stating that file should have metadata/action before actual data for all the records.
However my use case is to load data from Hive to ES which i'm doing by creating an External Hive table with ES serde. How can I leverage ES bulk load API to load data faster from Hive to ES?
are you aware of the Hive Integration in ES Hadoop? I'm not too familiar with it so I don't know if this might solve your problem. If not, it would be interesting to know why.
Yes, I'm aware of Hive Integration with ES. We are using it to push data successfully to ES from Hive. But the process takes hours. I wanted to know whether bulk load API can improve performance of data load to ES from Hive. If yes, how to use it with Hive? I couldn't find any documentation on that.
btw. there is a dedicated sub-forum for Hadoop-related questions like this at https://discuss.elastic.co/c/elasticsearch-and-hadoop, maybe folks there have more of an opinion about this question than here in the Elasticsearch forum.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.