How should I configure the number of node, shard and replica?

rider9581 · February 10, 2021, 7:38am

I have one server of the following spec:

RAM: 128 GB
HDD: 10TB * 2
SSD: 256 GB
CPU: 32 cores
and I am going to manage up to 10TB of text data using Elasticsearch 7.9.3.
Also, I plan to use Docker for the deployment of Elasticsearch and Kibana.
Please help me how I can determine the proper number of node, shards, etc.
What is the recommended architecture?

warkolm · February 10, 2021, 7:50am

Welcome to our community!

It depends. What sort of text data is this?

rider9581 · February 10, 2021, 8:04am

It is a kind of web page data from scraper

rider9581 · February 10, 2021, 8:05am

i am making the search engine for online forums, marketplace, etc

rider9581 · February 10, 2021, 10:49am

I tried with 1 node / 320 shards / 1 replica for each shard.
I created one index on it and stored nearly 3 TB / 60 million documents.
The issue is that when I search for something, the CPU usage is very low and the speed is very slow.
I think this is because of an un-appropriate configuration.
Kindly guide me on this issue.
Thank you very much.

Christian_Dahlqvist · February 10, 2021, 11:14am

I would recommend looking at disk I/O and iowait as disk performance often is the limiting factor in Elasticsearch especially when not using SSDs.

320 shards for 3TB of data also sounds excessive.

rider9581 · February 10, 2021, 12:30pm

It is HDD.
I need to store up to 10TB data, what number of shards should I use?

rider9581 · February 10, 2021, 12:31pm

and How many nodes should I put?

Christian_Dahlqvist · February 10, 2021, 1:51pm

I suspect 200 shards should be sufficient. Once you have ingested data, run realistic queries and monitor disk performance and iowait.

What type of data are you indexing?

rider9581 · February 10, 2021, 6:37pm

it is the text data that is scraped from web pages.

tusharnemade · February 11, 2021, 4:08am

Hi Rider

1 Shard =25GB in Size should be max.. if it goes beyond this then performance will decrease. This is also dependent on underlying Hardware. If you are using SSD , this is absolutely fine.

You can have 1 Index = 5 primary Shards and 1 Replicas.

Allocate Your HEAP SIZE = 32GB not more than that.

How much of data are you expecting to be stored in Single Index? , plan according to that how may indexes you want or use rollover concept for your index.

How much of data you are expecting to be pushed to Elasticsearch in 1 sec ?, plan to have 5 or more nodes to have that many Primary shards [ Number of Nodes = Your Primary Shards of Index ].

Perform Load Test on your ES and check how is the response, that will be the true solution for your design.

If you are having 20TB of HDD and 256 GB of SSD on same machine , what is the advantage of that for your Index , think in this manner too. As you index will be distributed randomly on your define data path as per config file.

warkolm · February 11, 2021, 5:02am

Not necessarily.

tusharnemade · February 11, 2021, 5:52am

Yes , its not a necessary one. Hence I mentioned about two things

Underlying Hardware
Load Testing of your Data Injection into ES.

This settings is presented as per my experience in my envt / project.

rider9581 · February 11, 2021, 6:21am

Thank you for your kind explanation,
Just wanted to double check....
So in order to store 10TB of data in a single index, how many nodes, primary shards and replica should I have?

tusharnemade · February 11, 2021, 6:31am

If you want to keep 10 TB of data into Single index , you may perform below ::

Create an ALIAS "myindex" pointing to Index named "myindex-000001"
Have rollover defined for 10 to 20 million records.

So that you will have myindex-000001 ... 20.. 300 like this indexes, where as you application will always point to alias.

so you do not have to burden single index and its shards n replicas.

You can have this Index myindex-000001 created with Primary Shards as 5 and replicas as 1.
Now if you are inserting docs too many per sec say 10k , then better to have more distribution say 10 primary shards, to support you insert IO.

Replicas you can keep as 1 or 2 depending upon your disaster plan of nodes.

Having 5 Primary shards and 1 Replica ==> you can go with 5 Nodes

Having 10 Primary Shards and 1 Replica or 2 Replica ==> you can go with 10 nodes.

Like this in 5 or 10 nodes you can divide you storage of 10 TB in distributed manner.

Christian_Dahlqvist · February 11, 2021, 6:42am

The ideal shard size depends a lot on the use case, the data and the mappings used. This is something that you need to determine by running benchmarks with realistic data and queries.

Given that the user is planning on using a single index the number of primary shards need to be significantly larger.

The goal here is to use compressed pointers, so the limit is strictly below 32GB, usually somewhere around 30GB.

This works for time series data but may not necessarily apply to this use case.

This is indeed the only way to properly optimize the cluster and determine the optimal shard size.

rider9581 · February 11, 2021, 8:23am

For 5 nodes, how many master nodes should I have?

rider9581 · February 11, 2021, 8:38am

this is my docker-compose.yml.
Could you kindly check if it is appropriate?

version: '3.2'

services:

elasticsearch1:

image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}

environment:

 - cluster.name=docker-cluster

 - node.name=elasticsearch1

 - "ES_JAVA_OPTS=-Xmx2g -Xms2g"

#  - ELASTIC_PASSWORD=reZeP6crgHBGVsKeAFyWnduTzcwyB4qR

#  - xpack.monitoring.collection.enabled=true

#  - xpack.monitoring.exporters.remote.type=http

#  - xpack.monitoring.exporters.remote.host=monitor

 - discovery.seed_hosts=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - cluster.initial_master_nodes=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

#  - xpack.license.self_generated.type=basic

#  - xpack.security.enabled=true

#  - xpack.security.http.ssl.enabled=true

#  - xpack.security.http.ssl.key=$ELK_CERTS_DIR/elasticsearch1/elasticsearch1.key

#  - xpack.security.http.ssl.certificate_authorities=$ELK_CERTS_DIR/ca/ca.crt

#  - xpack.security.http.ssl.certificate=$ELK_CERTS_DIR/elasticsearch1/elasticsearch1.crt

#  - xpack.security.transport.ssl.enabled=true 

#  - xpack.security.transport.ssl.verification_mode=certificate 

#  - xpack.security.transport.ssl.certificate_authorities=$ELK_CERTS_DIR/ca/ca.crt

#  - xpack.security.transport.ssl.certificate=$ELK_CERTS_DIR/elasticsearch1/elasticsearch1.crt

#  - xpack.security.transport.ssl.key=$ELK_CERTS_DIR/elasticsearch1/elasticsearch1.key

 - indices.query.bool.max_clause_count=10240

ulimits:

  memlock:

    soft: -1

    hard: -1

volumes:

  - /work/storage1/enigma-volume/elasticsearch1:/usr/share/elasticsearch/data

  # - ./elk-certs:$ELK_CERTS_DIR

ports:

  - "127.0.0.1:9200:9200"

container_name: "nexvisions-torscraper-elasticsearch1"

elasticsearch2:

image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}

environment:

 - cluster.name=docker-cluster

 - node.name=elasticsearch2

 - "ES_JAVA_OPTS=-Xmx2g -Xms2g"

 - discovery.seed_hosts=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - cluster.initial_master_nodes=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - indices.query.bool.max_clause_count=10240

ulimits:

  memlock:

    soft: -1

    hard: -1

volumes:

  - /work/storage1/enigma-volume/elasticsearch2:/usr/share/elasticsearch/data

container_name: "nexvisions-torscraper-elasticsearch2"

elasticsearch3:

image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}

environment:

 - cluster.name=docker-cluster

 - node.name=elasticsearch3

 - "ES_JAVA_OPTS=-Xmx2g -Xms2g"

 - discovery.seed_hosts=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - cluster.initial_master_nodes=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - indices.query.bool.max_clause_count=10240

ulimits:

  memlock:

    soft: -1

    hard: -1

volumes:

  - /work/storage2/enigma-volume/elasticsearch3:/usr/share/elasticsearch/data

container_name: "nexvisions-torscraper-elasticsearch3"

elasticsearch4:

image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}

environment:

 - cluster.name=docker-cluster

 - node.name=elasticsearch4

 - "ES_JAVA_OPTS=-Xmx2g -Xms2g"

 - discovery.seed_hosts=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - cluster.initial_master_nodes=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - indices.query.bool.max_clause_count=10240

ulimits:

  memlock:

    soft: -1

    hard: -1

volumes:

  - /work/storage2/enigma-volume/elasticsearch4:/usr/share/elasticsearch/data

container_name: "nexvisions-torscraper-elasticsearch4"

elasticsearch5:

image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}

environment:

 - cluster.name=docker-cluster

 - node.name=elasticsearch5

 - "ES_JAVA_OPTS=-Xmx2g -Xms2g"

 - discovery.seed_hosts=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - cluster.initial_master_nodes=elasticsearch1,elasticsearch2,elasticsearch3,elasticsearch4,elasticsearch5

 - indices.query.bool.max_clause_count=10240

ulimits:

  memlock:

    soft: -1

    hard: -1

volumes:

  - /work/storage1/enigma-volume/elasticsearch5:/usr/share/elasticsearch/data

container_name: "nexvisions-torscraper-elasticsearch5"

kibana:

image: docker.elastic.co/kibana/kibana:${ELK_VERSION}

environment:

 - ELASTICSEARCH_URL=http://elasticsearch1:9200

 - ELASTICSEARCH_HOSTS=http://elasticsearch1:9200

 - NODE_OPTIONS="--max-old-space-size=8192"  

#  - xpack.monitoring.ui.container.elasticsearch.enabled=true 

#  - xpack.security.transport.ssl.verification_mode=certificate  

 - elasticsearch.requestTimeout=300000 

 - elasticsearch.startupTimeout=500000  

#  - ELASTICSEARCH_USERNAME=kibana_system

#  - ELASTICSEARCH_PASSWORD=XDMk2JLNPLUWjlZrXdxV

#  - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=$ELK_CERTS_DIR/ca/ca.crt

#  - SERVER_SSL_ENABLED=true

#  - SERVER_SSL_KEY=$ELK_CERTS_DIR/kibana/kibana.key

#  - SERVER_SSL_CERTIFICATE=$ELK_CERTS_DIR/kibana/kibana.crt

volumes:

  - ./elk-certs:$ELK_CERTS_DIR

ports:

  - "127.0.0.1:5601:5601"

restart: always

container_name: "nexvisions-torscraper-kibana"

system · March 11, 2021, 8:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	976	February 20, 2017
Shards and replicas allocation in elasticsearch Elasticsearch	7	476	December 17, 2018
Number of nodes for primary and replicas required for resilience Elasticsearch	5	1624	April 9, 2020
SSD and one replica vs HDD and more replicas Elasticsearch	10	3070	July 5, 2017
How many shards do I need to have? Elasticsearch	5	476	May 12, 2019

How should I configure the number of node, shard and replica?

Related topics