Hi, I've testing the ES-hive library (elasticsearch-hadoop-hive-5.6.4.jar to match our cluster version) and I always get 1 reducer, is there any way to force more than 1 reducer, I had the intention of using hive to load half a billion of records into ES from a HDFS.
My query looks like:
set mapred.reduce.tasks=50;
set hive.exec.reducers.max=50;
CREATE EXTERNAL TABLE es_table ( 200 columns here)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'= 'a list of my nodes (I know I need only one!)',
'es.resource' = 'occurrence_es/occurrence',
'es.index.auto.create' = 'false',
'es.nodes.wan.only' = 'false',
'es.mapping.id' = 'id',
'es.batch.size.entries'= '10000');
and then
INSERT INTO TABLE es_table
SELECT .... FROM my_row_format_table;
Hadoop versions:
CDH 5.12.1-1.cdh5.12.1.p0.3
Hive 1.1.0-cdh5.12.1
I know that I probably should be doing this on Spark, Beam, MR, etc; I'm just evaluating options that involve less maintenance of source code, libs, etc.
Thanks.