Performance optimisation for Time based indexes

Vivek_Vardhan · August 26, 2019, 4:22pm

Hi Team
I am building an elastic search (6.6.1) cluster (deployed on Azure Kubernetes), with my current config is -

Data Volume 25 GB per day (with 1 month of retention )
Search scenarios (dashboards only few aggregation queries, max 50 users a day)

Current Configuration -

3 master eligible nodes (8 GB RAM , 2 Core CPU, 100 GB Hard Disk each each)
10 Data Nodes (8 GB RAM , 2 Core CPU, 512 GB (total to retain data for 1 month) )
No Ingest nodes

Other settings -

The Java process XMX is set to 3500m (as 50% was recommended).
Shards per indexes = 5 (default)
Replication per shard = 2
Master nodes are NOT data node
refresh interval for indices set to be 30s
For each day, data will be stored in separate index.

The data nodes are connected to a Kubernetes ingress service, which is backend of an Azure Application gateway.

The data ingestion to ES is done by Rest API provided by ES.

Now, after running this cluster for 15 days, I suddenly start getting timeouts. The client which was calling ES, receiving 504 from App Gateway of Azure.

The search query which we usually run is -
Index = prefix-2019-08* (which will take all indices created in august month), and a term query was getting run.

Is it supposed to take so much time (more than 20s)??.

During the timeouts, few observations -

None of the data nodes / master nodes logs was having any error.
One or two bulk rejections
Azure App gateway was showing 5xx errors
RAM uses were close to 50% of system (whatever was XMX given, complete utilisation was there by ES java process)
CPU usage per pod was very less, around 0.1%
Cluster health on nodes was Green.

My guess is -
The query was running for more time, as number of queries were increasing, the request queuing was increased, so the requests started timing outs.

How can we solve this? Do I need to increase number of data nodes?

Recommendations from blogs I read -

Use of rolling indices (decrease number of shards to 1, once write get stopped)
Check for frozen indices (should not be in my case, as I am regularly doing search with * (all indices))

Please suggest me what things I should check in my config, and what changes can be done in the config?

system · September 23, 2019, 4:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting Timeouts from elasticsearch Elasticsearch	3	1871	September 25, 2017
Elasticsearch Cluster Timeouts Elasticsearch	13	2908	August 17, 2018
Need help optimizing index with 450mio entries Elasticsearch	9	666	April 22, 2020
Improving Speed to Query Millions of Small Documents Elasticsearch	5	1774	September 23, 2019
TimeOut Error during indexing Elasticsearch	5	1852	November 28, 2020

Performance optimisation for Time based indexes

Related topics