Hive queries to read ES data taking too long | Need suggestions to improve

Preeti_Raj_Buchhada · April 7, 2016, 10:37am

Hello,

My ES version is 1.3.2
I setup an AWS EMR cluster (emr-4.2.0) with Hive 1.0.0 to provide an SQL-like interface to our data in ES. Since, our Analytics team prefers SQL like queries rather than ES.
I'm using the latest ES-Hadoop connector (elasticsearch-hadoop-2.2.0)

The setup was pretty easy and I'm able to connect to my ES (which is on a remote server) from Hive console.

However, queries that typically take few seconds in ES are taking 2 hours from Hive!
For e.g.
select count(*) from my_table where member_id = 1234;

I'm looking for suggestions to make this faster. Am I doing anything incorrectly?

Thanks...

Preeti_Raj_Buchhada · April 7, 2016, 10:58am

To add some more information:
the time taken for the above Hive query (select count(*) from my_table where member_id = 1234;) across different shards was:

Shard 0, which has 3.18 GB data, 8102573 docs took 59 mins.
Shard 8, which has 0.02 GB data, 32432 docs took 16 secs.
Some other shards have 7 GB of data, looks like it'll take 2-3 hrs on those shards.

costin · April 11, 2016, 2:44pm

There's not much that can be done here since Hive does not pushes down the query. That is the count in Hive means actually moving all the data from ES to Hive so it can count it.
ES-Hadoop has no visibility into the query being executed and thus cannot push it down (like in Spark SQL).

P.S. Upgrading your software stack should help - even if not performance wise, it will eliminate a good chunk of bugs.

Preeti_Raj_Buchhada · April 13, 2016, 11:39am

Thanks for your reply Costin, I am now trying Spark SQL and the pushdown is working great!

I've created 2 more queries:

Topic		Replies	Views
Hive read es data slow Elasticsearch es-hadoop	5	1145	December 20, 2019
Hive external table performance issue Elasticsearch es-hadoop	3	1701	August 24, 2018
How to get a better performance to load ElasticSearch data into Hive? Elasticsearch es-hadoop	1	399	February 22, 2021
Performance of Hive queries Elasticsearch es-hadoop	3	2271	July 6, 2017
Spark SQL: count(*) fetches all data Elasticsearch es-hadoop	2	2973	July 6, 2017

Hive queries to read ES data taking too long | Need suggestions to improve

Related topics