Handle 1 million inserts per second

we should implement really big data scenario with this features:

  • input data is 1 million logs per second
  • each log has almost 100 Bytes size
  • we should retain data for 10 days
  • system has 2,3 active users and they run maybe 1000 queries per day so our scenario is definitely heavy write

with this assumptions ( so many writes and small number of reads) , how much data each node in cluster should store ?

1 Like

Is that 1 million events per second an average or a peak volume? What type of hardware are you planning to deploy this on? What type of data is being indexed?

1 million is average volume
data is structured and has 10-15 fields. must of columns are string and integer.
My question is exactly that. what type of hardware we should use and how much resource we should consider for each node ( for maximum utilization of each server based on our heavy write scenario) ?

If you want to keep up with the flow off data you probably need to size the cluster based on peak indexing rate rather than the average. The average will however tell you how much disk space you are likely to need in the cluster.

As it is an indexing heavy use case and indexing is very I/O intensive you will benefit from having data nodes with fast, locally attached SDDs. You probably also need a good amount of CPU and fast network.

When estimating the size of the cluster needed, you will need to run some benchmarks on realistic data and hardware. The following resources may help:

https://www.elastic.co/elasticon/conf/2018/sf/the-seven-deadly-sins-of-elasticsearch-benchmarking

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/tune-for-indexing-speed.html

1 Like

is that any limitation for shard size or index size ?

You need to find an optimal size through benchmarking as it depends on the data, queries and the use case. A shard can hold up to 2 billion documents, but you are likely to see query performance deteriorate before you get close to that. Each index can have many shards, so there is no strict limit there.

ok i should get peak volume too.
we should retain this data 10 days and I think SSD for this size is very expensive.
how can I find detail of selecting server and resource recommendation for elastic ?

Given that you have a relatively short retention period, the data will not sit idle on disk for long, so I think you will need SSDs at least for all the indices being indexed into. You may want to consider a hot/warm architecture, but the retention period might be a bit short to properly warrant this.

Are you looking to deploy on bare-metal hardware or in the cloud?

no I don't
Ok thank you very much

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.