Everytime I am trying to reindex this index with 20million documents(20TB) with reindex api, it stops in the half way. I even tried using sliced scroll(slicing) method to break it into multiple jobs and parallelized the reindexing, but stops after sometime. Sometimes, either the node goes down or the cluster's overall health goes bad while doing this. Is there another efficient way of reindexing such huge documents without knocking my cluster?
I am doing through kibana console. Following query worked until halfway through and knocked down of the node and stopped completely. My query looks like this:
yes. Is more shards the better? I saw with 20 shards, it was faster while it lasted. Later I tried on less shards, it was slower and stopped eventually
@Christian_Dahlqvist
I have another index with similar number of documents but with only 5 TB size. I am trying to reindex that also. I tried that with 20 shards. That failed too
I'm not aware of any regular tests of creating such large shards, so you're a little off the beaten track here. I'd like to know more about how it is failing. Why is the node going down? Does it log anything about its failure?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.