Hi guys,
i set up i development hadoop cluster with the HIVE infrastructure and the elasticsearch-hadoop connector to allow SQL-like queries using ES-data.
As all work quite fine and sweet Joins are working and we think of using it in production, we came up with a nasty problem:
We are using (like many many other users out there) time based indices for log data in elasticsearch, so in order to improve user experience, it would be ideal to not having to create the HIVE metastore tables with static indices, but rather with sth like this:
CREATE EXTERNAL TABLE dynamic (logsource STRING, bytes BIGINT, src STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'logstash-{@timestamp:YYYY.MM.dd}/unix',
'es.nodes' = 'esnode:9200',
'es.query' = '?q=@timestamp:[2015-12-08T09:33Z TO 2015-12-08T09:35Z]') ;
like it is possible if you write data to elasticsearch and providing the necessary information in the query string in order to allow hadoop/elasticsearch to choose the right indices.
Using the _all index is ofc. possbile but regarding the tiered data setup of most users, this is very ineffective indeed.
Are there any suggestions or workarounds for this ?
Thanks for any input