Hive read es data slow

yousanghz · November 20, 2019, 10:07am

hive version 1.2.1
es version 5.5.0
hadoop-elasticsearch-5.5.0.jar

es.index: data_monthly 10 shard

this is hsql

CREATE EXTERNAL TABLE es_test5(
id string,
uid string,
wb_name string,
platform string,
comment_count int,
fetch_time timestamp,
play_count int,
favorite_count int,
repost_count int,
monthly_net_inc_favorite_count int ,
monthly_net_inc_play_count int,
monthly_net_inc_comment_count int,
release_time timestamp
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '192.168.17.111, 192.168.17.121',
'es.index.auto.create' = 'false',
'es.resource' = 'data_monthly',
'es.read.metadata' = 'true',
'es.mapping.names' = 'id:_metadata._id, uid:UID');

read data from es into hive ,

why 10 shard but 2 map ?
why slow

please help me , Thank

yousanghz · November 21, 2019, 5:40am

Please give me some advice.

rameshkr1994 · November 21, 2019, 1:38pm

Hi @yousanghz.

is your data locality is ES?

if you are using data locality as ES then it will take more time because of network bandwidth !

its better to use DSL query for searching ES data!

Thanks
HadoopHelp

yousanghz · November 22, 2019, 1:42am

@rameshkr1994
Thank you for your reply, I am very happy to receive a reply.

Please forgive me for my bad English.

My data is indeed stored in ES， but why 10 shard == 2 map ？

i try to use 20 shard index , but start up 2 map

I saw the official document writes ' In short, roughly speaking more input splits means more tasks that can read at the same time, different parts of the source. More shards means more buckets from which to read an index content (at the same time). '.

Did I understand it right?

Thank you again!

rameshkr1994 · November 22, 2019, 7:51am

Hi @yousanghz.

Thank you !

as per your reply : -

you :split means more task:- but me:split and shards are same concept in ES.

you: More shards means more buckets but me : buckets concept is totally diff with shards.
you:i try to use 20 shard index , but start up 2 map but me:what is map here your are getting number of map 2 while running the query from Hive?

your ES Cluster decides the number of shards and number of cluster nodes.

finally : you are applying the concept of Hadoop with ES.

correct me if i am wrong !!!

Thanks
HadoopHelp

system · December 20, 2019, 7:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get a better performance to load ElasticSearch data into Hive? Elasticsearch es-hadoop	1	399	February 22, 2021
Hive queries to read ES data taking too long \| Need suggestions to improve Elasticsearch es-hadoop	4	2252	July 6, 2017
Hive overwhelming Elasticsearch Elasticsearch es-hadoop	24	1433	May 18, 2021
Hive external table performance issue Elasticsearch es-hadoop	3	1701	August 24, 2018
Data size on disk increase 15 times when moved from hive to elasticsearch Elasticsearch es-hadoop	27	755	July 7, 2022

Hive read es data slow

Please forgive me for my bad English.

Related topics