10 billion records writen to ES ervery day,how many nodes and hardware should need?

kewn21 · March 24, 2016, 6:42am

100 thousand records per second writen in realtime from log files.Why has some suggest or experience?

wyhw · March 24, 2016, 7:15am

you need huge SSD, huge MEM, increases refresh_interval, increases bulk thread pool queuesize......

(nodes num * SSD size) > (N days * one day index size)

Christian_Dahlqvist · March 24, 2016, 7:29am

The given information is unfortunately not sufficient to provide any kind of estimate regarding required cluster size as this will depend on a number of additional factors. Some of these are:

What type of data are you indexing?
What is the average size of a record, ideally once it has been converted to JSON?
What are your requirements around retention period for your data?
How are you going to query your data?
How frequently are you going to query your data, e.g. what is the expected number of queries per second?
What are your latency requirements when querying?
Do you have any limitations around what type of hardware you can use, especially storage?

kewn21 · March 24, 2016, 8:19am

Average size of a record is 750bytes,retention period is a week.
Detail query with limit order and aggregation query,each split the total queries which is no more that 100qps.
Querying latency in 10s,the time interval for record to be effective is 60s.
Storage is sata hard disk, 11 hard disks and 50g mem and 16cores per node.

Christian_Dahlqvist · March 24, 2016, 9:48am

If I calculate correctly that means that you need to index around 72MB of raw data per second, which corresponds to close to 6TB of raw data per day. As indexing typically is I/O intensive due to the merging of segments that take place in the background, it is generally recommended to equip indexing nodes with SSDs. Given the volumes here and the fact that the data also need to be queried at a reasonably high rate, which will compete for resources, I think it will be difficult to give an estimate without performing any benchmarks. This chapter from the Definitive Guide discusses tuning the cluster for indexing performance. There are a few settings under 'Segments and Merging' that you will need to adjust depending on the type of disks you have. This talk from Elastic{ON} might also be of interest as it talks about performing benchmarks in order to determine cluster size.

thn · March 24, 2016, 9:56am

In theory, each shard can hold upto 2B records so in your case, you would need a daily index with more than 5 shards to begin with.

I would like to suggest the following so you can take some measurements before deciding what you should have in order to support your business requirements realistically.

index 2B records into one index with 4 shards (distributed roughly 500M records per shard) and record the index size, run your queries and record the response times. For the same query, I suggest to run it at least 3 times and record the response time per run.
repeat with 4B records with one index/4 shards (distributed roughly 1B records per shard)
repeat with 10B records with one index/10 shards (distributed roughly 1B records per shard) If the response time here does not meet your requirements, increase the number of shards to 15 or 20.

As you increase the number of shards, you'll find out how much disk space you need and how many nodes you need to spread your cluster. The HW numbers you gave above won't be able to handle your business requirements.

warkolm · March 24, 2016, 9:19pm

You'll need a lot of those to deal with the load!

Topic		Replies	Views
Huge Cluster Elasticsearch	5	982	July 5, 2017
Handle 1 million inserts per second Elasticsearch	9	4871	August 8, 2018
Elastic server hardware requirements Elasticsearch	9	1386	December 20, 2018
Determine Cluster Size Needed Elasticsearch	4	712	June 28, 2017
Elastic cluster hardware estimation Elasticsearch	3	544	May 30, 2018

10 billion records writen to ES ervery day,how many nodes and hardware should need?

Related topics