I need to manage 2 shared environments for Elastic Search, one for testing and one for production. I am interested in what are the best practices for managing multiple environments e.g. separate clusters vs separate indexes?
Our current setup is to have a single Elastic Search cluster but with separate index patterns for test and production environments. That in itself doesnt seem a problem given that this cluster is only used internally.
But we also have a single instance of logstash feeding both test vs production index patterns, we tag the inputs from test and prod sources and use if statements to decide whether feed goes to test vs production. So now we cant test changes without affecting the production feed. Its not a big deal now because we are still in "beta mode" but it does seem like an antipattern.
different data sources with overlapping structured data
Also we are sending logs from both our front end web app and our back end api server to the same index pattern, we justify this because there is sufficient overlap in the structured data to make it useful, so we can look for a user id and see the log history from both the front end and back end, but it is noticeable that the vast majority of data, 85% comes from the back end, so is this likely to cause issues? We like the ability to just select a single index pattern in Kibana and see the results from multiple sources.
Would anyone suggest best practice?
Should we have separate clusters for test vs production Elastic Search?
Should we have separate instances of logstash and if so can they run on same server or is different server preferred?
Is it ok practice to put data from both back end and front end into same index pattern if the structured data is similar enough?
The boring answer to most of your question is "it depends".
Should we have separate clusters for test vs production Elastic Search?
Perhaps. Is it likely that your Logstash test setup induces such a significant load that it might disturb the ES production traffic? Will Logstash require a different ES configuration? Is it important that you use the same index names for production and non-production (necessitating two clusters)?
From a Logstash perspective I see few reasons to have different ES clusters, but having different index series for production and non-production is useful.
Should we have separate instances of logstash and if so can they run on same server or is different server preferred?
If you want to test Logstash configurations in a production-like environment I'd use a separate instance, yes. I don't run that myself because all the pre-production testing I feel I need can be satisfied with local testing on my workstation and testing of filter behavior with Logstash Filter Verifier.
If it's convenient to run them on the same server (and it's not an issue capacity-wise) then go for it. If it requires hoop-jumping maybe not. I only run Logstash in Docker containers so unless two instances want to listen on the same TCP/UDP port it's trivial to run any number on the samt host.
Is it ok practice to put data from both back end and front end into same index pattern if the structured data is similar enough?
I'll always answer YES to this. There are too many reports of people doing something on a system and they think it's not production, but it ends up causing outages because that assumption was wrong. Mixing prod and non-prod environments is never a good thing.
Mixing non-prod is more reasonable, eg a single cluster with test/dev/qa.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.