Wrong number of docs in elasticsearch

i created an external hive table with EsStorageHandler and then inserted from a large table (61 million rows) into it , but when I look at the index its only giving me 216368692 docs?

CREATE EXTERNAL TABLE test_orc_es (
acct_num BIGINT,
pur_id BIGINT,
pur_det_id BIGINT,
product_pur_product_code STRING,
prod_amt FLOAT,
pur_trans_date TIMESTAMP,
accttype_acct_type_code STRING,
acctstat_acct_status_code STRING,
emp_emp_code STRING,
plaza_plaza_id STRING,
purstat_pur_status_code STRING
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'purchase/purchase_acct','es.nodes' = 'hadoop1:9200');

hive> insert overwrite table test_orc_es select acct_num,pur_id,pur_det_id,product_pur_product_code,prod_amt,pur_trans_date,accttype_acct_type_code,acctstat_acct_status_code,emp_emp_code,plaza_plaza_id,purstat_pur_status_code FROM test_orc_txn;

[hdfs@hadoop1 ~]$ curl 'hadoop1:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open purchase LgPZQ8D6RZC4Z2O1qobiIQ 5 1 216368692 0 52.1gb 26gb

hive> select count(*) from test_orc_txn;
OK
61429218

I narrowed the issue to type of table being copied from. if I copy the data from external ORC table then all the rows match the ES document count but if I copy the data from a bucketed Transactional table then the documents don't match the table row count.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.