Best Way to Ensure Elasticsearch Optimal Performance

mibeyki · March 27, 2024, 2:05pm

Hello,

I am writing to seek guidance on how to ensure optimal performance of Elasticsearch cluster.

I am running a three node ES cluster with huge resources (16vCPUs, 128GB RAM, 10 TB Disk) on each node.
However, i am collecting logs from 300+ servers and the index is created in the format, (ilm is auto), is index-name-{now/d}-000001. This basically creates an index every day and rolls over index like every 30 days or when it gets to 50G.
however, the indices are growing way too large (more than 3000 shards currently) and I feel like this is affecting the cluster as depicted by constant timeouts when executing some queries or even trying to save some settings like creating new ingest pipelines which always give 504 gateway timeout.
Elasticsearch logs show timeout in connecting to other cluster nodes.
Any idea to optimize my cluster?

I hope i communicated the issue well. Pardon me if I didn't.

Christian_Dahlqvist · March 27, 2024, 2:11pm

What is the full output of the cluster stats API?

What is the retention period for your data?

What type of storage are you using? Local SSDs?

mibeyki · March 27, 2024, 2:43pm

Hi @Christian_Dahlqvist
Please see the info below;
What is the full output of the cluster stats API?
Sorry not able to get this output currently but here is the output from the command i executed some time today.

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "elk",
  "cluster_uuid" : "vsdHTS8LQ2GRlsX7XQr_9Q",
  "timestamp" : 1711543091000,
  "status" : "yellow",
  "indices" : {
    "count" : 1604,
    "shards" : {
      "total" : 2603,
      "primaries" : 2543,
      "replication" : 0.023594180102241447,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 6,
          "avg" : 1.6228179551122
        },
        "primaries" : {
          "min" : 1,
          "max" : 3,
          "avg" : 1.5854114713216958
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.03449709060681629
        }
      }
    },

What is the retention period for your data?
1 yr
What type of storage are you using? Local SSDs?
datacentre HDD

Christian_Dahlqvist · March 27, 2024, 2:56pm

The stats output is only partial, so not much to comment on there. Will need to see the rest to draw any conclusions.

Elasticsearch is often limited by disk I/O. I would recommend running iostat -x on the nodes and see what await looks like. I would not be surprised if this is your main issue. Note that the use of SSDs is recommended in the search performance tuning guide as well as the guide for optimizing indexing throughput.

It may also be worthwhile looking into how you handle sharding and index data. If you are actively writing to a significant number of indices and shards this may result in a lot of small writes and resulting IOPS, which may not be ideal for slower disks.

system · April 24, 2024, 2:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing slows down dramatically as index size grows Elasticsearch	4	552	July 6, 2017
Cluster optimization(indexing/query performace) Elasticsearch	4	343	July 6, 2017
Recommendations to boost ES cluster Elasticsearch	1	326	July 6, 2017
Performance issues on EC2/EBS Elasticsearch	2	406	July 6, 2017
Getting Timeouts from elasticsearch Elasticsearch	4	1840	October 23, 2017

Best Way to Ensure Elasticsearch Optimal Performance

Related topics