To cluster or not to cluster?

Hi,

I have the following setup with two apps running on two servers, both on the same subnet:

app_1 > filebeat_1 > logstash_1 > elasticsearch_1 (es_1) > kibana_1
app_2 > filebeat_2 > logstash_2 > elasticsearch_2 (es_2) 

There's limited need for me to have replica shards as each elasticsearch instance just needs to store locally information from each app.

However, I'd like to be able from my Kibana instance to view all indexes and create visualizations with the data across the two es instances.

Despite heavy reading on the forum, I'm quite unsure as to which of the following might be the best solution for me:

  1. Cluster the two es instances together (making sure I have 0 replica shards in my index mappings)
  2. Setup cross-cluster analysis
  3. Setup a new tribe node and connect Kibana to it

Any recommendations?

Replica shards are more useful for resiliency than local querying - if you have two data nodes, and all your data is replicated, if one of your servers dies, you still have a copy of the data accessible.

Without knowing more about your use case, it's difficult to give a concrete recommendation between clustering two nodes and using cross-cluster search, but I can tell you that you absolutely should not set up a tribe node - cross-cluster search is better in just about every way, and tribe node functionality has been entirely removed in 7.0 and up, so staying away from it now will make future upgrades easier.

Thanks @gbrown, it's clear that the tribe node seems to be the obsolete way.

Right now I have no need for resiliency, i.e. I can afford to lose data but I want to scale out and be able to have more apps creating distributed local data lakes with one Kibana able to view and analyze/visualize everything.

I think this is the cross-cluster search use case and the time I spent configuring it over the weekend, I realized that cross-cluster search seems like a "read only" version of clustering without the privilege to write/replicate indexes across remote nodes.

Is that a fair assessment or am I missing some other capability by doing cross-cluster search and not clustering?

That's correct, you can use cross-cluster search to query (and therefore analyze/visualize/all that Kibana goodness) across multiple clusters. You're right that the remote clusters will be read-only from the local one, and data won't be replicated across clusters.

Given what you describe wanting, cross-cluster search does sound like it fits your use case - a read-only connection between your two one-node clusters that allows for search and aggregation through a single Kibana instance. As long as you're okay not being able to write to the "remote" cluster from your single Kibana instance (which it sounds like you are), cross-cluster search should work just fine, and it sounds like you have a good understanding of what it's for.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.