I am using Hive-Elasticsearch Integration to import csv of 500gb which consists of 550 million records.
I have 5 nodes cluster, with index set as: Replica=0, shards=10., refresh_interval = -1.
My job ran for around 35 hours and my count of records going in to ES surpassed 550million. I had to stop the job, and the state when I stopped the job was:
Count of records:- 645 million & size of index = 490gb.
I am unable to figure out why the job dint end when it imported 50 million records? How much longer do I need to wait for it to complete?
Any help would be appreciated.