I have around 15GB of data in ES, which is running on a single machine, having 5 shards and currently no replication. I am thinking of adding another machine to the cluster, as I am facing data corruption problems. Can anyone guide me on the ideal setup of the cluster, as my data would be increasing on a daily basis. Also can we avoid data corruption using replication?
Data corruption shouldn't happen regardless of whether you have any replicas. I'd investigate that further before creating a more complicated setup on a shaky foundation.
The "ideal cluster setup" question is too broad for a meaningful answer.
We are facing something very similar to this: Sum Aggregation returning very small, unrelated values and is very frequent, like when we update the data, it happens every once a week.
Elasticsearch version is: 1.6.0