How does loadbalancing and "talking to a cluster" work?

defalt · June 16, 2020, 8:18am

Hi,
I have learned a lot about clusteres in the past 2 weeks but one thing which I dont get the hang of is how to talk to the cluster. What i mean is that I have a Redis Database and I would like to ingest data into a Hot-Warm cluster through Logstash. I know that a thing like an Ingest and Coordination Node exists but this would lead to single point of failures right?

How do I set up a reliable cluster with no single point of failures (Failover procedures)? Is it possible to talk to all Nodes at once and the nodes itself decide which node has to perform which task? Hope you understand what I am trying to say.
Thanks
Defalt

warkolm · June 16, 2020, 8:21am

How much money do you have? I am not being flippant, it's an honest question as to how much you are willing to mitigate SPOFs.

No, you can talk to any node and it'll figure out where it needs to go. But that connection you have to that node is still a SPOF.

Christian_Dahlqvist · June 16, 2020, 8:24am

The default configuration has all nodes have all roles, and this is a great place to start. Just because you CAN have dedicated node types does not mean you SHOULD.

defalt · June 16, 2020, 8:29am

How much money do you have?

Around 5k-7k€. We think we would start with 3-5 Nodes and upgrade the cluster as we need more capacity.

Is it possible with logstash that it switches nodes if

One node is under heavy load
The node which its ingesting to goes down.
I know that Kibana has such capabilites with elasticsearch.hosts:

My new idea for the cluster would be that Kibana and logstash have connections to all nodes and than they decide which one should be used(Just like elasticsearch.hosts. Is that the right approach?

warkolm · June 16, 2020, 8:38am

In all honesty, I wouldn't be too worried about SPOFs. If this is business critical and you cannot do without downtime, then the business needs to value it appropriately.

Just aim for a 3+ node cluster, across different data centres in the same geographic region. Use replicas, automatic backups, stay up to date with your packages. Use ILM + SLM if you can. Things like that will help while maintaining your budget.

You may be even better off using something like Elasticsearch Service: Managed Elasticsearch on AWS, Google Cloud, and Azure | Elastic to reduce complexity, while getting access to all of the features - and more - that I mentioned.

The docs go into how it handles things on that level. But on a high level, yes.

defalt · June 16, 2020, 8:44am

Thanks for your great answers.

What are the best backup capabilities in ealastic? Is taking snapshots still pracitical or are there better backup technologies.

warkolm · June 16, 2020, 8:45am

Snapshots are the best for backups. If you had the money you could run multiple clusters and then use cross cluster replication.

But even then, taking snapshots is a backup. CCR is not.

defalt · June 16, 2020, 8:48am

We currently have 8 Snapshots over the course of 4 years. What is the best way to restore all that data and how should we approach this in the future? Currently we are just restoring the snapshots and creating new Indices for each. Each snapshot contains 5 Indices so in the end we will have 40 indices.

warkolm · June 16, 2020, 8:51am

That's not a lot. Our Elasticsearch Service takes them every 30 minutes.

Not sure what you mean by best way though. A restore is a restore.

defalt · June 16, 2020, 8:57am

Yes, but the restored data is smaller than the original index because its incremental. So is it better to reindex them together into one big index or multiple smaller ones.

Reading all your answers this would be the setup with which i came up with. Is this a good approach?

Thanks again for your help.

warkolm · June 16, 2020, 9:13am

Snapshots are incremental, so take them every N minutes, where N is your disaster recovery requirements.

defalt · June 16, 2020, 9:20am

A colleague raised the question if its possible to have Cluster A with online primary shards and indices and all the replicas are in the replica cluster B. Does this concept have any usecase?

azeiner · June 16, 2020, 2:05pm

Dear Mark, this is a very good point - what if you take a snapshot lets say every day from the Hot-node and delete all data which is older than 2 weeks. If you want to restore all data from an index (or at least 1 year or so) how can this be done since the last snapshot has not all data available right?

warkolm · June 16, 2020, 10:47pm

That's not possible, no.

warkolm · June 16, 2020, 10:48pm

Please see the delete section of https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-take-snapshot.html

system · July 14, 2020, 10:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Search Cluster Doubts Elasticsearch	7	1158	July 5, 2017
Кластер высокой доступности из 4 ElasticSearch, 4 Logstash, 1 Kibana и 1 Grafana Вопросы на русском языке	4	989	November 12, 2021
ELK Cluster doubts (ELK, Kibana and Logstash) Elasticsearch	2	695	January 3, 2018
Deploy Cluster ES with multiple node? Elasticsearch	10	3458	July 5, 2017
Is this ES config valid & how to address cluster Elasticsearch	2	228	October 28, 2022

How does loadbalancing and "talking to a cluster" work?

Related topics