Elasticsearch how to correctly calculate the number of shards

Eduard_mart · April 12, 2023, 12:47pm

the question is about the intricacies of configuration. Situation - there is one physical server. Two CPUs. 20 cores in total. The task is to load there a lot of text - about 250 millions of records. Each of which a couple of paragraphs. There will be few simultaneous users. Usually not more than one or two. I.e. the task is to speed up the search, but not throughput.

Does it make sense to run Elasticsearch on one physical server in multiple worker/data nodes mode - via docker compose? Or if one physical server, you can do with one and vice versa the overhead will be less?

Do I understand correctly that one shard runs on one kernel and no more. Or is it one data node running on one core. I read somewhere apache lucene only runs in one thread. And what is apache lucene instas? A shard or a node?

What is the ratio of nodes/to shards needed?

Does it make sense to do docker compose on one physical server with multiple datanodes? Or does it make sense to reduce memory usage per node?
How many shards should I specify? Should I make them equal to the number of datanodes? Or more? What is the ratio of datanodes to shards? If it makes sense to make more shards than datanodes. How do you calculate this in relation to CPU cores?
Does it make sense to make replica more than 0 if everything is on one physical server (and accordingly if physical server dies then data dies and as if it is not a priority, and there is no extra RAM) Does the number of replicas increase search speed? Or does number of replicas increase throughput (i.e. simultaneous requests) But if one user uses it doesn't matter? Well, is it right that replica 1 - exactly doubles RAM consumption? (by all datanodes)
What memory parameter JAVA should run.
I understand that the shard is a separate instance of apache lucene.

system · April 12, 2023, 12:47pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

system · May 10, 2023, 12:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How should I configure the number of node, shard and replica? Elasticsearch	18	899	March 11, 2021
Best practices: CPU core count vs. no. active shards per node Elasticsearch	20	12554	June 9, 2019
Understanding scaling for a read heavy cluster Elasticsearch	4	944	July 8, 2021
Elasticsearch Index shards per nodes Elasticsearch	13	1187	October 5, 2020
SSD and one replica vs HDD and more replicas Elasticsearch	10	3068	July 5, 2017

Elasticsearch how to correctly calculate the number of shards

Related Topics