For log analytics I am going to store log data from various application in elasticsearch as time series indices.
The estimated total data growth is maximum 10TB in 1 year considering the data retention period. With 5 master shards and 3 replica shards the total size comes to 40TB. For 40TB I am considering 20 data nodes (with 2TB per node)
My Question is with how many number of data nodes I should start in the beginning. Can I start with 10 data nodes in the initial setup and add additional data nodes as and when the data grows ?
Appreciate your suggestion.