How to do bulk insert from Hive to Elasticsearch for better data load performance?

ravi_yadav · July 8, 2016, 2:17am

I know that there is a bulk API in ES which can load data from a file, stating that file should have metadata/action before actual data for all the records.

However my use case is to load data from Hive to ES which i'm doing by creating an External Hive table with ES serde. How can I leverage ES bulk load API to load data faster from Hive to ES?

I'm using ES2.1

Thanks!
Ravi

james.baiera · July 11, 2016, 3:19pm

That's actually the underlying implementation of the es-hadoop connector for loading data into Elasticsearch. An external table pointing to Elasticsearch is configured with a storage handler that provides Hive with hooks for reading data out of Elasticsearch (via the scroll api) and for loading data into Elasticsearch (via the bulk api).

For more information on Elasticsearch for Apache Hadoop's native Hive integration, please see the hive docs page for es-hadoop.

Topic		Replies	Views
How to do bulk insert from Hive to Elasticsearch? Elasticsearch	5	1050	July 5, 2017
How to get a better performance to load ElasticSearch data into Hive? Elasticsearch es-hadoop	1	399	February 22, 2021
Not able to load data from hive to Elasticsearch using ESStorage Handler Elasticsearch es-hadoop	14	2597	June 7, 2018
Insert data into Elasticsearch from Hive in real-time Elasticsearch es-hadoop	4	2262	July 6, 2017
Whether I should use elasticsearch-spark-20_2.11-5.2.2.jar other than elasticsearch-hadoop-hive-.5.2.2.jar for loading hive table into Elasticsearch? Elasticsearch es-hadoop	2	1167	May 5, 2017

How to do bulk insert from Hive to Elasticsearch for better data load performance?

Related topics