How to do bulk insert from Hive to Elasticsearch for better data load performance?


(Ravi Yadav) #1

I know that there is a bulk API in ES which can load data from a file, stating that file should have metadata/action before actual data for all the records.

However my use case is to load data from Hive to ES which i'm doing by creating an External Hive table with ES serde. How can I leverage ES bulk load API to load data faster from Hive to ES?

I'm using ES2.1

Thanks!
Ravi


(James Baiera) #2

That's actually the underlying implementation of the es-hadoop connector for loading data into Elasticsearch. An external table pointing to Elasticsearch is configured with a storage handler that provides Hive with hooks for reading data out of Elasticsearch (via the scroll api) and for loading data into Elasticsearch (via the bulk api).

For more information on Elasticsearch for Apache Hadoop's native Hive integration, please see the hive docs page for es-hadoop.


(system) #3