Should I deploy elasticsearch in docker on one machine?

sigmastar · December 12, 2023, 7:30am

I wanna achive the best performance for Elasticsearch on a single machine. But right now, I'm running three Elasticsearch instance in docker on only one machine. Shoud I keep this for better performance or I should deploy elasticsearch locally?

dadoonet · December 12, 2023, 7:58am

Welcome!

Docker or not Docker is not really the concern here.
You should not run multiple nodes on the same physical machine (unless you have more than 64gb of RAM).

sigmastar · December 12, 2023, 8:19am

Yes, actually I got 256 GB memory on a DELL server, so in this case should I run multiple nodes?

dadoonet · December 12, 2023, 8:58am

Do you have only one available machine like this one?

Note that if the machine stops, the whole cluster will stop working.
That's why it's better to run on multiple machines.

I wanna achive the best performance for Elasticsearch

Are you looking for indexing speed?

Are you looking for search speed?

What kind of use case is it?

I'd at most run 3 nodes on that machine. Elasticsearch/Lucene use a lot the filesystem cache so all that RAM will be used anyway

sigmastar · December 12, 2023, 12:08pm

Yes, I only got one. Actually this machine is used for both indexing and searching so I wanna find the balance between them. It's ok that this machine stopped.

dadoonet · December 12, 2023, 12:22pm

How much indexing operations are you planning to do?
How much searching operations are you planning to do?

What kind of use case is it?

sigmastar · December 12, 2023, 1:37pm

Well, Actually it's a network flow monitoring system. The network flow will be parsed and processed, then stored into Elasticsearch, waiting to be searched.
The network flow is very huge and will cause 8k indexing operation each second at least (161 GB data per day, 1 billion records total). Also there are some cron jobs which will search data every minute for a web server to use and to show. But actually only a small part of data aqnd statistic results would be used every minute.

dadoonet · December 12, 2023, 10:38pm

I'd start with one node and if IO is not overloaded may be start 1 more data node if needed and a small master eligible only node.
I'd disable index replication as you said that you don't really care in case of data loss.

sigmastar · December 14, 2023, 8:08am

Yes, right now I've observed disk IO performance for 10 hours and everything is fine. The top IO usage is about 50%. Can I just disable index replications without shutting down the cluster? How should I do?

dadoonet · December 14, 2023, 8:34am

If you have only one node running, you don't need to.
If you have many nodes, you need to update the index settings using the index settings API.

It's not on a cluster level.

sigmastar · December 15, 2023, 3:05am

thanks! But I wrote my template.json as
{
"template": "*",
"settings":{
"number_of_replicas":0,
"auto_expand_replicas": "false"
}
}
and in logstash settings:
output {
elasticsearch {
hosts => ["http://127.0.0.1:9200"]
index => "wizard-log-%{[pro_type]}-%{+YYYY.MM.dd}"
template => "/usr/share/logstash/config/template.json"
#user => "elastic"
#password => "changeme"
}

but newly created index still got a replicas. What should I do ?

stephenb · December 15, 2023, 4:17am

That is not a valid template definition. See here:

rcowart · December 19, 2023, 12:35am

Since we at ElastiFlow have probably done more to put flow data into Elasticsearch than anyone else (including a cluster that can ingest 1.2M records/sec), I thought I would chime in and share what our extensive testing has revealed.

There is no need to run multiple nodes on a server that has 32 or less CPU cores. Above 32-cores there can be a very real benefit, but the details are important.
If running multiple nodes, each node should be bound to specific cores. This can be achieved pretty easy using docker. If the CPU is an AMD EPYC, make sure that the cores selected result in the fewest chiplets possible. For example a 64-core EPYC has 8 chiplets. You would run two ES nodes, each bound to the cores of 4 chiplets. This ensures the nodes maintain the closest affinity to their related L3 caches.
If the server has multiple sockets, NEVER EVER EVER, allow a single ES node to cross the fabric between sockets. This is achieved the same as #2... binding the nodes to specific cores. NOTE: The intra-socket fabric on Intel Xeon doesn't have as much impact as AMD EPYC, but is still not optimal.
Populate all memory channels. Failing to do so will cause a significant drop in performance. This is also true when running only a single node.

mem_chan_es873×526 72 KB
Provide each node with its own dedicated SSDs. Shared disks or RAIDs will negatively impact performance.

Following these recommendations will provide outstanding performance and allow you to best take advantage of a larger physical server.

BTW, you don't mention which flow collector you are using, but if it isn't ElastiFlow you should take a look at what we provide.
https://www.elastiflow.com/

Rob

sigmastar · December 19, 2023, 2:06am

That's awesome! But what if I have two CPUs and I got 64 logic cores and 32 of them comprise a numa node? Currently I'm running three nodes on this machine with two sockets so should I delete one node? I noticed that my system load is very high (30, 31, 30). Will these suggestions low the system load average?

system · January 16, 2024, 2:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Suggest Elasticsearch Cluster Configuration Elasticsearch docker	3	269	November 22, 2021
Docker Elasticsearch in Production Elasticsearch	3	350	June 14, 2019
Performance Impact on a single machine with one node and multiple nodes[due to containerization] Elasticsearch docker , ccr-cross-cluster-replication	2	459	August 12, 2019
Can I run multiple Elasticsearch nodes on the same machine? Elasticsearch	5	65952	July 5, 2017
Is there any performance problem using docker? Elasticsearch docker	1	378	August 8, 2019

Should I deploy elasticsearch in docker on one machine?

Related topics