Is the below sizing fits for mentioned use case?

Jax_dev · March 10, 2021, 6:23pm

I have been reading many documents to understand about sizing about elastic search...

Data load:

9k transactions per second
Each transaction contains around 1kb of data.
Per second, 9k inserts and 50k search queries will be done
Per day around 600 GB of data.
Would require indexing for each doc.

Response for each transaction should not take more than 15ms.

My proposal -

Client nodes - 2 ( for HA I have considered 2 nodes, should not miss any transactions)
Master nodes - 3
Data nodes - 3

All are dedicated vms’s for each node in azure.

Any best configuration for the above usecase.

Christian_Dahlqvist · March 10, 2021, 6:36pm

What do you mean by this?

It sounds like indexed data is immutable. How long do you need to keep data in the cluster? How large is the full data set?

How targeted are the queries? What type of queries will you be using? Can you describe the data and mappings?

Jax_dev · March 10, 2021, 6:50pm

I mean each record should be indexed after inserting into elastic.

Jax_dev · March 10, 2021, 6:54pm

If my understanding is correct... we would require this data forever in the server. There were chances that they might not require the data - will get back on this.

When you say how long do you need to keep data in the cluster - do you mean we have to remove the old data from the data nodes after some period of time ?

Christian_Dahlqvist · March 10, 2021, 6:55pm

Are you planning on using bulk inserts? Does this mean that you want the document searchable immediately after indexing?

Well, if you do not you will need infinite storage...

Jax_dev · March 10, 2021, 7:02pm

Not for bulk insert. There were multiple types of insertion here.

We have provided an api in nodes which will trigger elastic servers.
These api’s are called by third party customers ( multiple customers ) - they might upload excel sheet in their portal and trigger our api for each row.
Same api is available for third party customers in which they will do single transaction. We will receive requests on parallel.

Combining above 2 scenarios we would get around 9k transactions per second.

Jax_dev · March 10, 2021, 7:05pm

Yes correct. I will get back on this... they might use some period of time and they might archive the data... I will get back on this shortly.

Christian_Dahlqvist · March 10, 2021, 7:09pm

If you do not use bulk inserts and require the indexed document to be searchable immediately you will make indexing very inefficient. This basically goes against most recommendations in this guide around optimizing indexing speed. This will result in a lot of small segments being generated and requiring merging which will put a lot of load on the cluster and result in a lot of disk I/O. In itself indexing 9k documents of 1kB in size is achievable but might require more cluster resources and very fast storage.

You also mention a quite high search rate. In order to optimize the search rate supported by the cluster you ideally want to have immutable data that is fully kept in the operating system file cache. Even if there was no indexing going on you state that you are likely to have very large amounts of data. If this does not fit in the cache it will generate a lot of disk I/O which will lead to longer latencies and reduced query throughput.

If you add these two together you see that the indexing will add new data, which will affect the page cache and make it less efficient. I therefore do not think Elasticsearch is suitable for this use case (unless you work with the requirements) and if you were to try make it work you would need a lot of hardware.

Jax_dev · March 10, 2021, 7:25pm

I will get back on this shortly, whether we need indexing or not.

Jax_dev · March 17, 2021, 1:36pm

Here are the inputs which you have asked, is it a valid usecase to use ES here ?

There should not be any latency, each request should get processed with in 10 to 20ms.

Christian_Dahlqvist · March 17, 2021, 1:43pm

That sounds much more reasonable but you will need fast disks and enough RAM to allow most data on disk to be cached. In order to determine whether you can meet the SLAs or not you will need to benchmark using realistic data and operations. Sounds feasible with correct timing and bulk indexing.

Jax_dev · March 19, 2021, 8:30am

Any suggestions on the configuration and number of nodes ?

Client nodes - 2 ( for HA I have considered 2 nodes, should not miss any transactions)
Master nodes - 3
Data nodes - 3

is this ok or any best recommendations ?

Christian_Dahlqvist · March 19, 2021, 9:01am

You will need to test and benchmark.

system · April 16, 2021, 9:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hardware & approach Recommendation to implement elasticsearch Elasticsearch	5	787	March 22, 2018
Handle 1 million inserts per second Elasticsearch	9	5044	August 8, 2018
Recommendation for Elastic Search sizing for 45,000 Events per second Elasticsearch	6	973	June 3, 2019
Elasticsearch design feasibility Elasticsearch	10	189	November 6, 2023
Questions about architecting and design Elasticsearch	2	371	July 6, 2017

Is the below sizing fits for mentioned use case?

Related topics