Hive external table performance issue

Serge_Lashdev · July 27, 2018, 3:37pm

Hi there,
I created an external table in hive to query indices in ES whith the following statement.
CREATE EXTERNAL TABLE svaadclog (
bytexfr BIGINT,
httpquery STRING,
httpreferer STRING,
httpresultcode INT,
ipsrc VARCHAR(15),
loghost STRING,
targetname STRING,
time TIMESTAMP
) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'svaadclog*/adclog',
'es.query' = '{"query": {
"range": {
"time": {
"gte": "13/04/2018-19:55:00",
"lte": "13/04/2018-20:05:00",
"format": "dd/MM/yyyy-HH:mm:ss"
}
}
}}'
);
A simple select is much slower (460 secs) than the kibana query (< 1sec) and the java processes are CPU intensive. IS there any guideline to improve performance of reading ES whith hive external tables ?
ES stack is the latest , hadoop is 3.1.0

james.baiera · July 27, 2018, 4:37pm

Hive is not a generic SQL client - It is translating your SQL query into a set of map reduce jobs and running them over the scroll output from Elasticsearch. Kibana on the otherhand is taking full advantage of Elasticsearch's search capabilities, and returning paginated results. Hive needs to request all the data in order to materialize its results. There are some things in Hive that can improve its performance that are still in development, namely the filter pushdown functionality, but even with that finished, Hive is unlikely to ever match the performance of a regular query submitted to Kibana.

Serge_Lashdev · July 27, 2018, 4:48pm

Thanks James,
420 sec for returning 510 records is quite far from fast. I would like to cross check my ES index with some blacklists.Are Storm or Spark more adequate?

system · August 24, 2018, 4:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get a better performance to load ElasticSearch data into Hive? Elasticsearch es-hadoop	1	399	February 22, 2021
Performance of Hive queries Elasticsearch es-hadoop	3	2271	July 6, 2017
ES-Hadoop 5.1.1 performance Elasticsearch es-hadoop	3	916	February 14, 2017
Hive queries to read ES data taking too long \| Need suggestions to improve Elasticsearch es-hadoop	4	2252	July 6, 2017
HIVE query time based indices Elasticsearch es-hadoop	3	1218	July 6, 2017

Hive external table performance issue

Related topics