We were trying to reindex our index so that we could increase the number of
shards in our index. The number of shards in our index is 20, and we wanted
to make it 500.
So, we used Tire gem, reindex method for doing that (it basically make a
scan search on the index, scroll through the index and then bulk insert for
each scroll). But, we found that in our dev environment which has about 250
- When we do reindex with default size (which is 10, so 20 shards means
200 documents), we are getting data loss (only 50 to 60% were being indexed
in the new index).
- When we tried it with scan API and then inserting it one by one, it
was inserting without data loss, but this obviously takes more time.
- When we tried with size 1, (20 documents at a time), there was no data
So, we went ahead and tried with size 1 (20 documents at a time), in the
production environment (which has about 30 million records). We found that:
- Even for size 1 ( 20 documents at a time), there was dataloss (we
indexed around 220 thousand documents, and only 190 thousand documents were
indexed. 30 thousand were lost) and it was also slow, so we had to stop in
Why is this data loss happening during bulk insert?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e991d917-be19-4fe5-8938-70df53cd3cde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.