Write into ES using Hive uses only one reducer

Hi, I've testing the ES-hive library (elasticsearch-hadoop-hive-5.6.4.jar to match our cluster version) and I always get 1 reducer, is there any way to force more than 1 reducer, I had the intention of using hive to load half a billion of records into ES from a HDFS.
My query looks like:
set mapred.reduce.tasks=50;
set hive.exec.reducers.max=50;
CREATE EXTERNAL TABLE es_table ( 200 columns here)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
'es.nodes'= 'a list of my nodes (I know I need only one!)',
'es.resource' = 'occurrence_es/occurrence',
'es.index.auto.create' = 'false',
'es.nodes.wan.only' = 'false',
'es.mapping.id' = 'id',
'es.batch.size.entries'= '10000');

and then
SELECT .... FROM my_row_format_table;

Hadoop versions:
CDH 5.12.1-1.cdh5.12.1.p0.3
Hive 1.1.0-cdh5.12.1

I know that I probably should be doing this on Spark, Beam, MR, etc; I'm just evaluating options that involve less maintenance of source code, libs, etc.


This could be an issue with Hive, but could more likely be an issue with what query you are running in the SELECT .... FROM my_row_format_table; statement. Certain hive functions like distinct require the planner to only schedule one reducer.

thanks for sharing.

ankara temizlik şirketleri

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.