Good Morning,
My team and I are trying to improve a cluster that we inherited. I apologize for the following vagueness, but due to the location of the cluster I will give as much detail as possible. We currently are working with a 7 node cluster all data nodes are dell r640s and we are on version 7.6. We also have three logstash servers inputting data into the cluster. There is a small amount of pre and post-processing happening on the logstash servers, but all of the data nodes are all still set to default. So, they are all still dilm nodes. There is a huge ingest pipeline on the cluster that is doing most of the processing and mapping of the information that is coming into the cluster.
I have been doing research and have been told that it's not ideal to be using logstash and ingest nodes. This may be true, but offloading all of the ingest processing out of the cluster and onto the logstash servers seems like a pretty major undertaking to perform on a production cluster. We are currently performing about 20,000 indexing operations per second within the cluster. I was hoping to bring two more nodes online and make them indexing only nodes. While I am at it I would like to make only 3 of the data nodes master eligible, and for the time being I am going to take machine learning off of all nodes as it is not currently in use.
I was wondering if you guys think that this is a decent or horrible idea. I am also open to any other suggestions that you believe would make this cluster function more efficiently.
Thanks