[hadoop] Multiple indexes setting for 'es.resource'


I have a problem with 'es.resource' configuration for including multiple

The hive table I created is like below

date timestamp,
clientip string,
request string
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
'es.resource' = 'apache-2014.09.29/apache-access',
-- or
-- 'es.resource' = 'apache-2014.09.30/apache-access',
'es.mapping.names' = 'date:@timestamp https://github.com/timestamp'

and I used 'select count() from test;' which is a hive query to count the
total number of rows of the table.
the result is same with ES count.
the count result are 1454536 and 215564 for each apache-2014.09.29 and
apache-2014.09.30 index
then, I changed 'es.resource' = 'apache-2014.09.29/apache-access' to
'es.resource' = 'apache-2014.09.
/apache-access' or
'es.resource' = 'apache-2014.09.29,apache-2014.09.30/apache-access'
for including multiple indexes.
and I used 'select count(*) from test;' again to count the total number of
documents of the indexes,
but the result is different with ES count.
the count result is 2919161 which should be 1670100 (1454536 + 215564).

any help?

environmental information

  • centos base 6.4 64-bit / java version "1.7.0_55"
  • CDH-5.1.2-1.cdh5.1.2.p0.3
  • hive 0.12.0
  • elasticsearch-hadoop-2.0.1
  • 3 nodes' hadoop and es cluster

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12e1d78b-9f6f-491a-87d8-5249c45b9812%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.