ELK Architecture Distribution for Hardware to Achieve high availability

I want to know how to distribute the nodes VMs across the physical servers to achieve HA my cluster will be contain of these specs below and I want to make something like this

1 Like

Here are some recommendation and reminder from my side:

  1. For a hot node 8vCPU, 2 TB Disk, 64GB RAM is the best.
  2. To use frozen node features you should have a license.
  3. Have even (2-4-6-8-…) number of data nodes.
  4. Use shard allocation awareness
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: zone1,zone2
2 Likes

Thank you for your reply, but I mean for this sizing what is the best practices to distribute them on physical servers should I for example put them all on same server or 2 or 3 or whatever to ensure HA

It is pretty poor etiquette to @ someone who has not answered on the thread at all. Please do not do this.

By my count, you are trying to put 23 VMs with 202 vCPU and 856 vRAM on 5 physical servers? Do you have 5 servers that can cope with this load? What are your server specs? What sort of disk/IO are you using? Are you intending on overcommitting resources?

Generally, elastic recommended dedicated resources. Your 7 (sic) hot nodes cannot be distributed on only 2/3/4/5/6 servers without at least an unattractive lack of symmetry (because 7 is a prime)

I am not quite sure what "Integration" means in this context, but I'd generally keep my integration and prod environments separate.

There's also no indication where you got your table of node counts from, nor anything at all about your use case.

3 Likes

How did you arrive to those number of nodes and configuration?

For example, your cold nodes have the same vCPU/RAM count as your hot nodes, normally you would use less resources for your cold nodes, are only the disk different on them?

What are integration? Is it the Logstash instances?

Do you are have a license or is in the process of acquiring one? I think Elastic can help you to design your cluster according to your needs.

2 Likes

This a security use case (SIEM) this is a sizing based on 800 GB per day ingestion rate with 90 days retention period 7 days on hot 23 cold and 60 frozen every other question you asked i made this question to know what I should do in the physical server part because I didn’t find any documentation that explain this in deep I want to know best practice to choose hardware specs that will handels this number of nodes don’t take my assumption (5 servers ) It can be one or 7 or any number that will work perfect depending on your opinion and best practices

We are in the proposal phase and yes we will acquire the enterprise license if we get the project, because of this i’m asking to know best practices to design the cluster and to get suitable hardware specs that meet the sizing needs, and also specific regarding hardware specs like network card, frequency of CPU, ram speed and etc…

Thanks for clarifying.

Note that was not at all obvious from what you actually wrote!

To be honest, I'd suggest you use this phase to engage Elastic pre-sales/sales/prof-services, as they can advise really well, in part because I suspect you will be happier to share all the finer details of your SIEM use-case with them, rather than on a public forum. You should also IMO at least consider a hosted/cloud option, as for a start it will reduce the day-to-day mgmt cost significantly.

Nothing wrong with that per se, but this sort of 3-tier solution generally, not always but often, means 3 IO profiles. Really fast IO (NVMe say), not-quite-as-fastt IO, and object storage (S3) for frozen. And you wrote nothing at all about IO so far. Your original table should have 2 extra columns, one for storage type, and one for storage capacity.

If it must go in your own data center, I’m fairly old school, so personally I wouldn’t virtualise this at all. I’d go with a standard bare-metal server model, using a few different hardware profiles depending on role. All disk space is locally attached.

  • Masters: 3x small server profile, RAM-focused
  • Kibana: 2x small servers
  • Fleet: 2x small servers
  • Hot tier: 6x servers with ~2-3 TB fastest-you-can-get-IO (e.g. really good NVMe) each, more CPU, 64G RAM
  • Warm tier: 2x servers, similar as hot spec but with significantly more disk capacity per node
  • Frozen tier: 2x smaller servers (similar to masters)

Consider to have a couple of spare servers, if possible.

ML nodes are harder for me to size sensibly without knowing how much ML you’ll actually be running, and what kind of ML workloads you are planning.

Virtual solution work too of course, and in some sense are more easily adapted and resized/grown as you learn more about your data and environment.

1 Like

yes you are right this is the full sizing table with 67 TB S3 Like object storage and I will go with virtualization, because of this I’m asking for best architecture and again thank you very much for your reply

There are issues with S3-adjacent-but-not-actually-S3 storage solutions that come up on here now and again.

btw, you have asked not-dissimilar questions several times on this forum over last 2 years.

Posted Jan 2024. 25 months ago. And even that thread was a duplicate.

Let me ask you - how much effort are YOU putting in here? I and others are very happy to help people, but what have you learned over those last 25 months? Maybe take a course or two, try to get an Elastic Engineer certification perhaps ? There's even an Elastic Certified SIEM Analyst !!

Because of this, I asked, and I believe these are two distinct questions that complement each other in building a High Availability (HA) cluster. Thank you again

Now you confused me completely. So after asking in the Jan 2024 threads you have built, managed and maintained a SIEM cluster these last 2 years? Over time building your knowledge and understanding.

If so, your current cluster is configured how? And what volume does it handle?

And now you just don't know how to build that out to "achieve high availability"?

Have I got it right?

I’m genuinely trying to understand where you actually are, and what you’ve personally done in this space so far. Because if, and I do mean if, your hands-on experience is limited, then I again strongly suggest engaging Elastic directly (pre-sales / professional services), or bringing in an IT professional who has real-world experience deploying and operating SIEM clusters. What you’re describing should not, in my opinion, be someone’s first Elastic deployment.

And the reason I am pushing on it is simple: if you’re relatively inexperienced with Elasticsearch and the surrounding tools/stack, that inexperience is IMO a much bigger risk factor right now than server count or hardware spec. To use an analogy: you’re asking which types of planes are best to fly VIPs around - and I’m trying to work out whether you or anyone on your team actually even has a pilot’s licence. :grinning_face:

1 Like

Thank you very much, I understand your point. There was some confusion, and I’ll be communicating with the Elastic Presales team.

1 Like

btw that appears to be not enough disk space.

800 GB/day. At least one primary and one replica. Any sort of HA requires indices with at least one replica. So 2x your data volume.

7 days data on hot. 7x2x800GB ~= 12TB. You also want at least ca: 20% free disk space per node too, and prefereably more (consider if a node dies, thats nodes shards needs live somewhere else, temporarily at least).

Cold tier has same issue.

And is the 800GB/day the size of the data when stored into Elasticsearch, or the raw data size?

800m GB/day raw data

So you need index that, or a representative sample, and see how much disk space it uses when stored in Elasticsearch.

This depends on a variety of factors, the data, the mapping, what fields you store, _source settings, etc.

It can easily be much more than 800GB when stored into one or more Elasticsearch indices. Even before you consider replicating the data.

It is worth noting that if they are using a commercial license this opens up for the possibility of synthetic source, which can significantly impact the storage calculation. I would in this case leave the sizing to the Elastic pre-sales team as they likely have superior data on storage requirements when it comes to different types of integration and the real world impact of synthetic source.

1 Like