We are doing a POC to analyse and visualize 64 million records (175 colomns) of data on a 3 node cluster. Below are the settings:
Each node capacity: CPU: 10 cores; RAM: 64Gig; Storage: 150GB
I am using logstash to ingest data from Netezza to Elasticsearch. After the logstash is started, we have updated replica of the index to 0 as this is a test. We are getting lot of low disk errors even though the total cluster storage is 450 GB (total storage of cluster). individual node disk space is 150GB. I have checked the index size for instance 12.5 million records took around 50GB of space on single node.
I have the following questions:
Can you please let us know how the storage is split across cluster? For Example, if the disk space on one node is low, will elasticsearch stores the data in other nodes?
Any better recommendations around ingesting the data into the cluster.
What the next steps in case the node goes read only mode.
Will appreciate your quick response on this.