With roll over api, It is a tradeoff to figure out
How much data to be kept in a single index vs rolling over to new index when index becomes large?
3..Also, how many shards should we keep in an index even if we configure roll over? Keeping whole data in a single shard and roll over vs keeping data in 5 shards(eg) and roll over
I would like to get recommendations as baseline to start my tests.
I have high incoming rate and concurrent searches over entire data(hot and cold). I am not concerned much about the search latency but I have to ensure cluster stability(even during extreme searches) and high indexing rate. As you suggested, I will have an index with 5 shards and index upto 500 GB(approx 100 GB in each shard) in hot nodes. Later when i roll over to cold nodes, Should i reduce the total number of shards to a single shard containing 500GB? Does it have any dependency over system RAM? Each of my machine is 8 core cpu and 64GB RAM with 2TB SSD. What are the recommendations and implications?
Thanks for your reply. Is 50 GB shard size recommendation limit only for actively indexed shards? I got confused when you said that it is not a problem in cold nodes. Have you seen this shards of this huge number before? Can you please tell what all could be affected when having huge shards?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.