Elastic Architecture on Ex-Hypervisor servers

Hello ELK community,

I'm currently implementing my first ELK infrastructure in production.

The plan is to ingest S3 logs that are sent from 54 source servers where we installed filebeat.

We have 3 physical servers available, each with 1TB ram , 2 X CPUS Intel(R) Xeon(R) Gold 6426Y Hyperthreaded, (16 physical cores, 32 threads, 2.5 Ghz frequency)

These servers are also equipped with NVME disks with a capacity of around 23TB for each server.

The plan is to make a cluster with these 3 servers making all of them data + master nodes (2 data nodes per physical server). plus the implementation of logstash on each of them which would be the target of filebeat, logstash here would do simple filtering (drop fields) + format to JSON.

The cluster will be queried every 5 minutes by a Nifi PaginatedJsonQueryElasticsearch querying 10000 lines every time (https queries) with a keepalive of 10 minutes every time. This is something that was already in place and I can only act on the frequency/keepalive time. but it has to go through Nifi for reasons that are specific to our organization.

The ingestion into the ELK cluster is continuous (around 1TB per day) with a retention of 7 days.

For now I'm planning to have all roles of the cluster (including logstash) in these physical machines, It seems like it goes against ELK best practices to have the master and data localised on the same host, but also I'm wondering if wiht powerful machines like these (which were used as hypervisors before) the implementation would be possible and especially safe. otherwise we always have the option to implement VMs to support some non disk critical roles.

The indexing can be delayed to a couple of minutes if needed, the thing I'm mostly worried about is Resiliency and Backpressure to the Filebeat agents where it would start affecting the servers we are monitoring.

TLDR : How safe is it to have all ELK roles + logstash on physical servers that are continuously ingesting + being queried by Nifi

Thank you & regards.

I'm pretty sure the previous message is AI-generated. It's certainly not very accurate. Three nodes won't suffer from a split-brain as described, and frankly it's probably ok to have them all do everything. No need to have two nodes per host or anything complicated like that.

I mean in practice you're going to have to do some benchmarking to generate a realistic workload and verify that it works as expected, but you should start with the simplest possible architecture first and only make it more complicated once you have some evidence that the extra complexity is needed.

Thank you, David, for your feedback, which I greatly appreciate. Please excuse the inaccuracies in my previous message.

You are correct: on a 3-node Elasticsearch 8.x cluster, split-brain is no longer a real risk, and the "all-in-one" architecture is perfectly viable for getting started.

My analysis was too influenced by older practices (pre-7.x) and lacked nuance. The real point of concern remains the CPU contention between Logstash and Elasticsearch at 1 TB/day, but that's a resource issue, not a coordination problem.

Thank you again for your clear explanations and high standards.