Elastic server hardware requirements

Aayushi_Jain · November 22, 2018, 6:30am

We have around 5 billion of records per year, retention period is 1 year. General queries will involve a date range selection and around 10 filters and some aggregations to work on. Time taken to run the query should be as minimum as possible. Also it will be queried very frequently.

What will be the hardware requirements for such scenario? I know it depends on various other factors as well but just a general idea of how many index/nodes/shards should be created?

Aayushi_Jain · November 22, 2018, 6:34am

Elastic version 6.4

Christian_Dahlqvist · November 22, 2018, 7:08am

What is the average size of your documents? What is the size of data on disk?

Aayushi_Jain · November 22, 2018, 7:22am

Currently 145 million records are taking 60GB of disk space.

Christian_Dahlqvist · November 22, 2018, 7:25am

Does that include replica shards?

If I have calculated correctly that corresponds to around 2TB of data on disk if we assume it grows proportionally.

If you truly want optimal performance, you generally want all data cached in memory on the host in the file system page cache. That will result in a quite large cluster with a lot of RAM.

If that is not an option, you are probably going to need as fast storage as possible as that often is the limiting factor in Elasticsearch. I would therefore recommend hosts with fast local SSDs.

With respect to node count and exact specification with respect to CPU and RAM, I believe this is something you need to benchmark to find out.

The following resources might be useful:

Aayushi_Jain · November 22, 2018, 9:20am

I watched the video and that has cleared many of my doubts.
I will look upon structured data and custom mappings.
One thing is in my scenario new data will be injected on monthly basis and only read operations are to be done daily. In this case creating replica shards will result faster in response comparatively to primary shard? Currently I am having only 1 primary shard and replica shard is unassigned.

Also one node can have max of 16GB RAM so creating multiple nodes sounds right to you maintaing the 1:16 ratio?

Christian_Dahlqvist · November 22, 2018, 9:50am

Replica shards help with high availability and is also the way to increase query throughput. If you only have a single node, replicas can not be assigned as Elasticsearch will not assign multiple copies of the same shard to a single node.

Usually nodes go up to around 64GB RAM (as we recommend staying below 32GB heap and use 50% of RAM for heap), so I am not sure where you did get this from.

Aayushi_Jain · November 22, 2018, 11:19am

So we will need multiple nodes to create replicas?
And 64GB RAM for a node is good in general?

Christian_Dahlqvist · November 22, 2018, 11:22am

Yes, a minimum of 3 nodes is required in order to achieve high availability.

The amount of heap you require will depend on your data and your query patterns, so you need to test to find out. It is recommend running with as small heap as possible (as long as this does not cause issues) as this generally results in faster GC.

system · December 20, 2018, 11:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How much minimum ram is required for these indicies? Elasticsearch	3	5911	January 22, 2020
Hardware requirements for Elasticsearch Elasticsearch	2	715	November 13, 2019
Help regarding hardware requirements Elasticsearch	3	392	July 20, 2018
Elastic cluster hardware estimation Elasticsearch	3	544	May 30, 2018
Elasticsearch Resource requirements Elasticsearch	4	5378	February 6, 2019

Elastic server hardware requirements

Related topics