Cluster(s) scale and failure design (ES 5.4)

Eduard_Kubanda · September 22, 2017, 8:14am

Hi guys,
I decided to post a question about ES cluster(s) design because I think my actual design sucks a bit.

My goal is to set up ES infrastructure which will:

be able to index/update 10 000 - 100 000 (and more) daily, there are also data (100 000 - 1000 000) which will be used for reading only
be failure resistant (I understand there are many levels to achieve this, but let's talk about a basic failure resistant design)

I can run 8 nodes at this moment in total (my resources limitation).

My design #1:

2 clusters, run on separated machines
each cluster contains 2 master-only-eligible nodes, 2 data nodes
usage of design: cluster1 and cluster2 contain mirrored data, when update/index is needed I will tell my application to use cluster1 and I will update cluster2, if everything is OK, I will tell application to use cluster2 (and do synchronization of cluster2->cluster1)
goal is to have a data backup and do not limit application when large index/update is running
problems: even number of data nodes (i know about split brain problem), quite complicated data migration

My design #2:

1 cluster
3 master-only-eligible-nodes, 5 data nodes, nodes run on separated machines
here comes my questions, how to design update/index operations scenario, but do not throttle or limit application in background and still have a backup
would you suggest using aliases with backup&restore function of ES ?

(I have accidentally sent a non completed post, sorry).

Thank you.

warkolm · September 22, 2017, 8:15am

That's bad, see Important Configuration Changes | Elasticsearch: The Definitive Guide [2.x] | Elastic

Not sure what that means.

Eduard_Kubanda · September 22, 2017, 8:47am

I have accidentally sent a non completed post, sorry. Post is updated.

warkolm · September 22, 2017, 8:51am

What do you mean by that?

Eduard_Kubanda · September 22, 2017, 9:00am

Consider scenario that I need to do large index/update/upsert operation for a index. This index is used by a web application for search purposes. If I start doing large index/update/upsert operation, my search result speed may throttle during that. Also, if index/update/upsert operations ends in incorrect state, I still need the original data.
I am considering to mirror index in cluster (design #2) and use aliases to point to updated version of index, but I am not sure about that.

warkolm · September 22, 2017, 9:01am

Define large?

Also, you should use 5.6 as that is latest

Eduard_Kubanda · September 22, 2017, 9:06am

I do not know exact numbers, but I think at start there could be 100 000 - 1 000 000 and more updates daily, it could be growing in time.
Yep I know about 5.6 version, but I optimized a lot of things for 5.4, I need to check changes etc.

A_B · September 22, 2017, 2:23pm

That load does not seem very high to me but I'm not updating any documents just indexing new ones...

We are running 3 different clusters for log analysing and the mid sized one runs on two physical servers with 5 ES nodes on each machine. Looking in Kibana, we're getting 15M logs per day at the moment. And the machines are not really under any high load but that is up to the HW resources of course. Also probably how complex the structure of your documents are...

These machines are 5 years old with spinning disks. Just to give some context.
24x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz / 125.90 GiB RAM

Each ES node has a dedicated drive.

Some other design possibilities.
Dedicate 3 machines as master nodes if you really want dedicated master nodes and run two master nodes per machine, one for each cluster. That would leave you with 5 machines for data. Maybe one primary cluster with 3 machines and one secondary with two...

Or do something like what I do and run multiple nodes per machine. That of course depends a lot on what sort of hardware you have...

My setup is not really "production grade" but in my case that is acceptable. Sounds like you will have higher demands on availability.

Good luck. And I would also recommend to use 5.5 or 5.6.

AB

Eduard_Kubanda · September 24, 2017, 12:13pm

Thank for your context.
My problem are hardware resources. I do not have enough resources to run 2 clusters with 3 master-only-eligible nodes + data nodes.
I think I can make my index/update and backup logic with one cluster using aliases. My idea is, as I wrote above, to have 3 master-only-eligible nodes and 5 data nodes. Nodes would run on separated machines to ensure stability against hardware failure. My index/update logic must ensure that web application (search features only) using ES is not throttled during indexing/updating:

2 mirrored indices, alias using index #1
when update is needed, I start updating index #2 and make snapshot of updated index
if everything is fine I set alias to use index #2 and restore snapshot to index #1

Am I missing something? I know that this scenario has to be tested, but in general - could it work ?

Thank you.

system · October 22, 2017, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster design with fully replicated data - please advise Elasticsearch	1	426	January 15, 2018
Questions related to ES cluster architecture Elasticsearch	3	347	July 6, 2017
Cluster design for specific use case Elasticsearch	5	481	November 27, 2019
Many clusters each with one small index (current) VS Single "Big index" cluster VS Multi clusters each with 3 small indexes Elasticsearch	2	757	July 6, 2017
ES cluster Elasticsearch	2	438	November 7, 2017

Cluster(s) scale and failure design (ES 5.4)

Related topics