Hive external table performance issue

Hi there,
I created an external table in hive to query indices in ES whith the following statement.
CREATE EXTERNAL TABLE svaadclog (
bytexfr BIGINT,
httpquery STRING,
httpreferer STRING,
httpresultcode INT,
ipsrc VARCHAR(15),
loghost STRING,
targetname STRING,
time TIMESTAMP
) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'svaadclog*/adclog',
'es.query' = '{"query": {
"range": {
"time": {
"gte": "13/04/2018-19:55:00",
"lte": "13/04/2018-20:05:00",
"format": "dd/MM/yyyy-HH:mm:ss"
}
}
}}'
);
A simple select is much slower (460 secs) than the kibana query (< 1sec) and the java processes are CPU intensive. IS there any guideline to improve performance of reading ES whith hive external tables ?
ES stack is the latest , hadoop is 3.1.0

Hive is not a generic SQL client - It is translating your SQL query into a set of map reduce jobs and running them over the scroll output from Elasticsearch. Kibana on the otherhand is taking full advantage of Elasticsearch's search capabilities, and returning paginated results. Hive needs to request all the data in order to materialize its results. There are some things in Hive that can improve its performance that are still in development, namely the filter pushdown functionality, but even with that finished, Hive is unlikely to ever match the performance of a regular query submitted to Kibana.

Thanks James,
420 sec for returning 510 records is quite far from fast. I would like to cross check my ES index with some blacklists.Are Storm or Spark more adequate?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.