I have a cluster of 4 master, 8 data and 2 client/query nodes. This is ingesting logs from logstash. Currently, logstash is sending to all the data nodes (assumption: its load balancing). Is this the most optimal way? Should logstash send to the client/query nodes instead? Currently, the client nodes are what Kibana and Grafana speak to and have the most RAM and cores provisioned for it.
I do foresee burst of queries coming in (~100). But not more. However, the queries could result in tons of data.
Well 4 masters because I started with 3 and they were spread out in different subnets and I felt one of them was flaky because of subneting issues so I brought another one up, but thats not the case so I kept 4. Ideally they should be 'odd' in number? (Will bring one down then)
Slightly overwhelmed with all the knobs there are to squeeze out the best performance, will get there soon hopefully. (Hint: pointer to a nice 'Tips'n'Tricks would help)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.