have a fully working and tuned Elasticsearch host which also runs logstash on the same host, say Host-A. This however is a standalone ES host where the data files are being ingested using logstash to elasticsearch and I have kibana as the front end. The total ELK Stack and a single machine.
The host-A is a 32 core, 512GB ram with 3TB SSD harddrive. I optmized JVM, input throttles etc and currently logstash is ingesting about 1 Billion records at the rate of 25 Million documents per hour. (indexing rate at about ~6500/s average).
However, I noticed that although I have added 32 worker threads for the logstash instance, the data is not injested any faster.
I have two more hosts with the same config which I can add into the cluster. But for my scenario I really could not find the configuration.
Can I have the existing standalone host-A and convert it to Master+Data node? Host-A already indexed 500 Million records in the last two days.
Host-A : ClusterA: NodeA - Master:True, Data:True
Host-B : ClusterA: NodeB - Master:False, Data:True
Host-C : ClusterA: NodeC - Master:False, Data:True.
Is the above a valid configuration? How about logstash? Theres only one instance of logstash running on the master, from where it is also injesting the log files.
Will the data be distributed across the NodeB and NodeC? How does that work? Do I have to allocate shards or it will be taken care when I add the other hosts to cluster?
Does JVM needs to be optimized the same way as for secondary nodes? How is the processing distributed in this case?