Imported records in Elasticsearch surpassed the original count in csv

StephenC · May 2, 2018, 6:20am

I am using Hive-Elasticsearch Integration to import csv of 500gb which consists of 550 million records.
I have 5 nodes cluster, with index set as: Replica=0, shards=10., refresh_interval = -1.

My job ran for around 35 hours and my count of records going in to ES surpassed 550million. I had to stop the job, and the state when I stopped the job was:
Count of records:- 645 million & size of index = 490gb.

I am unable to figure out why the job dint end when it imported 50 million records? How much longer do I need to wait for it to complete?
Any help would be appreciated.

system · May 30, 2018, 6:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.