Write into ES using Hive uses only one reducer

fede · March 1, 2018, 8:45am

Hi, I've testing the ES-hive library (elasticsearch-hadoop-hive-5.6.4.jar to match our cluster version) and I always get 1 reducer, is there any way to force more than 1 reducer, I had the intention of using hive to load half a billion of records into ES from a HDFS.
My query looks like:
set mapred.reduce.tasks=50;
set hive.exec.reducers.max=50;
CREATE EXTERNAL TABLE es_table ( 200 columns here)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'= 'a list of my nodes (I know I need only one!)',
'es.resource' = 'occurrence_es/occurrence',
'es.index.auto.create' = 'false',
'es.nodes.wan.only' = 'false',
'es.mapping.id' = 'id',
'es.batch.size.entries'= '10000');

and then
INSERT INTO TABLE es_table
SELECT .... FROM my_row_format_table;

Hadoop versions:
CDH 5.12.1-1.cdh5.12.1.p0.3
Hive 1.1.0-cdh5.12.1

I know that I probably should be doing this on Spark, Beam, MR, etc; I'm just evaluating options that involve less maintenance of source code, libs, etc.

Thanks.

james.baiera · March 17, 2018, 12:03am

This could be an issue with Hive, but could more likely be an issue with what query you are running in the SELECT .... FROM my_row_format_table; statement. Certain hive functions like distinct require the planner to only schedule one reducer.

cssturkiye · March 24, 2018, 1:32pm

thanks for sharing.

ankara temizlik şirketleri

system · April 21, 2018, 1:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
From hive to es: Could not write all entries[832/832] (maybe es was overloaded?) Elasticsearch es-hadoop	3	3714	April 26, 2017
Duplicate data on hadoop Elasticsearch	2	813	July 6, 2017
Hive overwhelming Elasticsearch Elasticsearch es-hadoop	24	1428	May 18, 2021
Not able to load data from hive to Elasticsearch using ESStorage Handler Elasticsearch es-hadoop	14	2593	June 7, 2018
Select es-hadoop table from hive failed Elasticsearch es-hadoop	8	2262	July 6, 2017

Write into ES using Hive uses only one reducer

Related topics