Performance Problems

Good morning,

I have several Elasticsearch clusters and they are giving me performance problems. I have located (I think) the problems but I need you to confirm if I am on the right track.

Basic configuration normally used:

x shards as ingestion nodes have and 1 replica of the data, 40GB indexes with rotation by size, a single indexset.

1º One indexset for all data (30TB or 200TB of data) all go to the same indexset

2nd Total number of shards in the clusters (if the index I normally set it to 40GB, 4-6-8 shards depending on the number of ingest nodes and the retention I need, sometimes I have 9.000 shards, 12.000 shards...

3º Normally I put both shards and ingestion nodes (to optimize the ingestion theoretically), but I do not merge the shards once I close the index, I have seen that it can be done with shrink or merge but normally it requires moving the data to another indexset and that For me right now it is a problem, is there a way to reduce the shards once the index is closed, leaving only the X shards in the deflector?

Normally each Elasticsearch node has 8 CPUs and 60Gb RAM dedicated and the total fields are around 800-1000 types. The environments are currently in Kubernetes, but I have environments without Kubernetes, Version 7.17.3.

I appreciate any help provided, I have the clusters closely monitored with Grafana and others so any data you need is at your disposal.

Thanks greetings!

Hello again, I'll provide you with more information to see if it helps you tell me where to go.

Approximately, the elastic clusters have 6 HOT nodes with 60GB of ram with 8 CPUs and 4TB disks each node and another 10 WARM nodes with 60GB ram and 4 vcpu with 9TB disk each node, in a DELL EMC Unity 480XT array...

Thanks greetings!

When it comes to shard sizing the recommendation is generally to aim for an average shard size between 30GB and 50GB. It is the shard size and not the index size that is important. If you have an index with 6 primary shards and 1 replica configured, the total size of the index should be between 360GB and 600GB. It sounds like you may be having far too many small shards in your cluster, which can affect performance.

If you are using a hot/warm architecture it is also important that the hot nodes, wh perform all the I/O intensive indexing work, have really fast disks (this can easily become the bottleneck), ideally local fast SSDs. If you are using some type of networked storage for all node types, it may be that it is not performant enough.

Hello again,

First of all, thank you very much for your help, the hot nodes are on local SSD disks in a Unity 480XT array and the Warm nodes are on SASNL disks in the same array, all connected by 10Gbs.

I'm surprised that you talk about shard sizes of 40-60GB and indices of 360 to 600GB, I thought that the ideal shard size was around 10Gb.

Could this have more impact on performance than the other things I mentioned above? How to have everything in the same indexset etc...?

Thanks again, regards!

Ideal shard size depends on use case. As outlined in the documentation a shard size between 20GB and 40GB is common and I have seen a maximum around 50GB recommended.

I understand that having many small shards (around 7-10GB per shard) can have more impact on performance than having only one 25TB indexset, I believed that the biggest loss of performance in my clusters came from that reason, from not segmenting plus the information in different indexsets.

The heap ratios that I am using are above what you usually recommend, I know, (150:1) but the clusters contain a large amount of information and resources get out of hand.

We are talking about storing logs, around 1.5k per message, with 30 day retention, volumes of 20,000-60,000 EPS, they are TB and even PB of data, although it is true that of those 30 days only 3 are in HOT nodes, the rest are moved to warm nodes on a scheduled basis.

Thanks again, regards!

Hello again,

I provide you with graphs of the cluster so that you can have more global information in case you see any parameters of interest.

Thanks greetings!






@Christian_Dahlqvist

Do the graphs I left you tell you anything?

Would your recommendation then be to make the shards and indices larger?

Thanks for helping, greetings!

What exactly is the nature of the performance problems you are experiencing? Is querying slow? Is it different for new and old data? Any other patterns? Is indexing throughput the problem?

Hello again!

We noticed several things,

  • The most serious thing is that searches lasting more than 5-7 days the cluster has a very bad response.

  • When the active index is rotated, the throughput drops a lot, almost to a minimum.

  • We think of the cluster with 6 HOT nodes (8vcpu/60GB RAM each node with SSD disks in the cabin) 10 WARM nodes (4vcpu/60GB RAM with SASNL disks in the cabin) to ingest about 25,000 EPS with 3 days in hot and 27 in warm but It is not capable of regularly ingesting more than 16-18k, having drops to 1000-2000 EPS when the active index rotates for many minutes.

Thank you very much for any improvements or suggestions, greetings!

Do you have dedicated master nodes deployed? If so, what is the number and specification of these?

Is there anything in the logs around issues propagating cluster state?

How many indices and shards are you actively indexing into?

Is performance bad if you query a time period of 5-7 days that is far enough in the past to entirely be located on the warm nodes?

What does query performance look like when querying only data residing on the hot nodes?

What does I/O stats, e.g. await and disk utilisation, look like on warm and hot nodes respectively?

Hello again!

Do you have dedicated master nodes deployed? If so, what is the number and specification of these?

- Yes, 3 master nodes of 2vcpu/4GB ram and 3 coordination nodes of 2vcpu/4GB ram

Is there anything in the logs around issues propagating cluster state?

- At the log level, no errors of any kind are seen.

How many indices and shards are you actively indexing into?

- 900 indexes with 6 shards per index, only one active index and all in an indexset

The most serious thing is that searches lasting more than 5-7 days the cluster has a very bad response.

Is performance bad if you want a time period of 5-7 days that is far enough in the past to entirely be located on the warm nodes?

- those 5-7 days are currently all in the hot part, because the cluster is not at 75% of disks because the ingestion is gradually increasing the eps level, that is, yes, that data currently sought is in HOT nodes with 8vcpu /60GB RAM and SSD disks in cabin (6 nodes of this type)

What does query performance look like when querying only data residing on the hot nodes?

- The 3-5 days I mention are in hot nodes.

What does I/O stats, e.g. await and disk utilization, look like on warm and hot nodes respectively?

I leave you a screenshot of the data at the cabin level of IOPS/MBS of the luns of the hot nodes


Each LUN contains two HOT elastics, so there are 3 LUNs as you can see

Thanks greetings!

I am not sure this would be logged as an error, so you may need to search all log levels.

I do not understand what you mean. Could you please further explain how you organise indices and shards?

Are you using time-based indices? If so, are you using rollover or index names with timestamps?

If you are using time-based indices it is generally only the latest one that receives data to index. Is this the case for your use case? How many indices are you indeing into at any point in time? How many shard does these indices have?

OK, so the poor performance is related to the hot nodes then.

I am not sure how to interpret this as it looks like used IOPS levels are very low.

How are you indexing data into Elasticsearch? Are you using Logstash? Filebeat? Some other tool? What bulk size are you using? Which nodes are indexing requests sent to?

I am not sure this would be logged as an error, so you may need to search all log levels.

Not only did I search by mistake, looking at the logs of the elastic nodes, only messages about index rotation from hot to warm nodes and so on appear, but apparently nothing of interest.

I do not understand what you mean. Could you please further explain how you organize indices and shards?

Are you using time-based indices? If so, are you using rollover or index names with timestamps?

If you are using time-based indices it is generally only the latest one that receives data to index. Is this the case for your use case? How many indices are you indicating into at any point in time? How many shards do these indices have?

I will try to give as much detail as I can, currently it is a single indexset with 900 indexes and 6 shards per index, the rotation of the indexes is fixed by size (40GB) and they are stored based on times, because what is stored are the ones and these are inserted depending on its timestamp. Currently on this platform there are approximately 600 different fields and yes, only the last one receives data. Open and ingesting data only one index, with 6 shards (1 shards for each hot node)

OK, so the poor performance is related to the hot nodes then.

I am not sure how to interpret this as it looks like used IOPS levels are very low.

How are you indexing data into Elasticsearch? Are you using Logstash? Filebeat? Some other tools? What bulk size are you using? Which nodes are indexing requests sent to?

In front of Elasticsearch there is a Graylog, the size of the volumes is as follows:

4TB per node on HOT nodes
9 TB per node on WARM nodes

Data disk for the nodes I mean

Sorry if I don't answer some of the questions as well as you need, if you need to specify something more or I don't explain myself clearly, please tell me.

Does this mean that you have a single index pattern that matches all these and either a timestamp of sequence that distinguishes the indices? Can you show stats from the _cat/indices API for a few of these? Is it 6 primary shards or 3 primary shards and 3 replica shards?

Good morning,

I leave you some screenshots of the cluster, there are 6 fragments and 1 replica.

Thanks greetings!


It looks like indices are around 70GB in size. Given that each index has 6 primary and 6 replica shards you have an average shard size under 6GB, which is very small given the data volumes you are ingesting. I would not be surprised if this is contributing to your performance problems arounbd querying. If you were to instead aim for an average shard size of 40GB, which I would consider reasonable, each index would instead be around 480GB in size ( 240GB primary shards and 240GB replica shards). I would recommend changing this and see what impact it has on query performance. This will only apply to new indices, so you may need to use the shrink index API to reduce the shard count for existing indices.

I am not familiar with Graylog so can not give advice on how to tune this. If increasing the average shard size does help with query performance and you are still seeing issues with indexing throughput while IOPS is low, it is possible that Graylog indexing is actually the bottleneck. It may not be optimised for the volumes you are indexing and use too log concurrency or too small bulk requests.

Also note that indexing can be resource heavy and will atke resouces away from querying. Having 4TB of data on hot nodes that are indexing at a high rate may not leave sufficient resources for fast querying.

Good afternoon Christian_Dahlqvist

First of all, thank you for your time and patience. As a first measure to verify that it is not a performance problem at the Elasticsearch level, you suggest changing the configuration to 40GB shards and seeing how the platform behaves. I understand that the already stored indexes should look for a way to reduce the shards of the already saved indexes as you mention.

Regarding Graylog, the indexing of the information is carried out as I see by Elasticsearch, Graylog sends the information and gives the order to rotate the active index based on the configuration described in it, but the optimization of the information and the indexing is carried out by Elastic.

The cabin where the platform is supported is a Unity 480XT and is dedicated for that infrastructure, so at a performance level it is very comfortable and can provide many more IOPs than what Elastic is using.

Do you think that changing the topology of the infrastructure in some way could make it considerably better? Smaller hot nodes or something? Our main point to improve performance is to separate the information into more than a single indexset, but in none of your posts, even though you emphasize it, do you mention it, is this topic so irrelevant?

Thanks greetings!

I suspect this may help with querying performance. The number of indices and shards you are currently indexing into seem fine.

If you are seeing low IOPS and no large iowait on the nodes I suspect it is possible the bottleneck is within Graylog. I would therefore not immediately jump at making changes in the Elasticsearch cluster.

I would look into how Graylog sends data to Elasticsearch. It may be that it is using too small bulk requests or not sending data with sufficient parallelism to properly put load on the cluster.