I have to decide between setting up an extra cluster for monitoring or just let the production cluster monitor itself.
Aside from being able to see historical data if the monitored nodes are unavailable, are there any other benefits to having a production AND a monitor cluster?
As you mentioned, if the monitored nodes experience any issues (outage, load, etc.), the monitoring data is still available in the separate monitoring cluster. This is extremely useful if ever you need to investigate potential issues and perform some root cause analysis (e.g. a node went down at a given time T1, your monitoring data between TO and T1 might give some clues on what happened before the node went down).
Performance. If you were to self-monitor your cluster, this could have a performance hit (monitoring + writing data into the same cluster). It is best to let your production cluster handle the critical / business operations and just export these monitoring data to another cluster.
These are the main reasons @Mattness that I can think of. From experience with our customers, self-monitoring production clusters is generally a bad idea. From troubleshooting/performance perspectives, I would strongly advise you to follow our documented recommendations
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.