Performance issue on 40TB index

Suresh_Ghatuwa · March 17, 2025, 3:13pm

Hello All,

I have 40 TB of index having about 6 Billion documents in single index.

My ES query is
Fetching 100000 unique values of uniqueId values (applying terms aggregations) from single ES query.

Currently, we initialized index having 900 shards across 35 data nodes. And most of the time is spending on coordinating nodes.

How can i configure the Elasticsearch cluster for better performance?

Please suggest.

Christian_Dahlqvist · March 17, 2025, 3:49pm

What type of data do you have in that index? What is the structure of the data? What does the query/aggregation look like? How many unique ids are there in total?

Which version of Elasticsearch are you using?

How have you determined this? What is the specification of the nodes? What exactly is the performance issue? What latency are you experiencing?

Is this the size of primary and replica shards? If so, how many replica shards do you have configured?

Suresh_Ghatuwa · March 25, 2025, 4:24am

@Christian_Dahlqvist

Please find the reply:

What type of data do you have in that index? What is the structure of the data?

Currently, we are storing all data (Total 6.3 Billion) in single index. We are storing as flat data with around 500 fields in each document. And we are storing the different type of data in single index.

What does the query/aggregation look like?

We are aggregating the total amount of each unique ids.

How many unique ids are there in total?

There present around 11M unique ids in single type.

Which version of Elasticsearch are you using?

We are using ES 7.17

How have you determined this?

We executed the ES profiling API and found the response of shards is less than 1 sec. And usage of data node is low while the usage of coordinating node is high.

What is the specification of the nodes?

We are using c7g.8xlarge (64 GB total memory and 32 vCPU) for data node and coordinating node.

What exactly is the performance issue? What latency are you experiencing?

On analyzing the profiling response, the time taking section is from Coordinating node and the response time from ES is high.

Is this the size of primary and replica shards? If so, how many replica shards do you have configured?

Currently we are using primary shards only (i.e. without replica shards). Does replica shards also improved on performance?

Christian_Dahlqvist · March 25, 2025, 6:40am

What does CPU usage look like on the different node types when you run a query? What size and type of storage do you have attached?

Elasticsearch is generally limited by disk I/O and not CPU, so I tend to use memory optimised instances for Elasticsearch clusters unless I am running a lot of CPU heavy processing, e.g. complex ingest pipelines. Have you run iostat -x on the data nodes when you are querying to verify that the storage is not a limiting factor (the coordinating node can only process data as fast as it comes off the data nodes after all)?

Suresh_Ghatuwa · March 25, 2025, 7:43am

Regarding the CPU usage, usage on data node is normal. But high CPU and Memory in Coordinating nodes.
Currently, we are using st1 disk type.

Please find the result of one of the data node from iostat -x

Christian_Dahlqvist · March 25, 2025, 8:52am

What is high and normal in terms of concrete numbers? How many cores are fully utilised?

Topic		Replies	Views
ElasicSearch cluster are crashing when conducting heavy aggregations Elasticsearch	3	454	August 6, 2019
How can I tune for Elasticsearch performance? Elasticsearch	9	624	May 18, 2020
Cluster optimization(indexing/query performace) Elasticsearch	4	349	July 6, 2017
Cluster Optiomization Elasticsearch	3	344	February 16, 2020
Number of indices while reworking general architecture Elasticsearch	4	354	November 4, 2021

Performance issue on 40TB index

Related topics