I am using the EsStorage class to store millions of documents from a 40 nodes hadoop cluster to a 3 nodes elastic cluster. It is very convenient, but I found out that I am loosing many documents during this process.
By default, 70 reducers are instantiated and about 5% of the documents are lost.
I reduced manually the number of reducers to 12, and I'm now loosing less than 1% of docs, but I need to reach 0%
I tried to change tthe es.batch.size.bytes and es.batch.size.entries parameters and although this changes the number of lost documents, I'm still far from 0%.
It seems that Pig connector do not verify if the batch of doc was successfully indexed. If a batch failed, it seems not to be retried. Is there any setting parameter to set that I'm missing ?
Thanks for your help