Updating ELK causes triple write IOPS

Hello

We (in our company) recently upgraded our ELK environment (using new servers) from 1.7.2 to 6.2.4.
Another change we made is that before we used the logstash-forwarder on the hosts but now we use filebeat to forward the hosts.
Next to this we also changed the cluster setup:

  • Previous setup
    • 1 server installed with Kibana, Elasticsearch & Logstash
  • New setup
    • 2 elasticsearch servers (node)
    • 1 server with kibana & logstash

The amount of logs changed practically the same (a few more) but there is a lot more parsing (dividing our logs into more fields and mapping some fields to integer) but next to that only the version and the setup has changed.

But we do notice that the old ELK stack (one server) has an average of 175 write IOPS while the new ELK stack has the following amount of write IOPS:

  • Master elasticsearch node: 542 write IOPS
  • Data elasticsearch node: 523 write IOPS

Is there any known reason that this happens or is this a configuration mistake?

Any help would be appreciated

Thank you

More fields of more types can definitely contribute to this.

The storage configuration can also play a role. What is the storage setup on the old vs. the new system. In particular, does the new system use software RAID, especially 5 or 6?

Also, what are your event rates?

Yes, I knew that more fields could be a reason for this but not in this order (the new cluster doesn't process three times as much logs so it seems weird that the write IOPS are so much higher).
The storage configuration is exactly the same, no changes whatsoever

The event rate by on the new ELK stack varies from 500 to 2500 (with peaks to 4000) events per second second, on the old one I can't really know exactly since we don't have any monitoring enabled on this but when I look at the indices the count of entries is the same (with a deviation of 5% max)

In Elasticsearch 6.x more data types use doc values than in Elasticsearch 1.x. This reduces heap usage but also results in additional disk I/O and larger size on disk. You may also want to compare mappings between the two versions to see if there are any other changes.

You might also want to do a GET /NAME-OF-INDEX/_settings on a similar index running on both systems and compare the value of refresh_interval a new segment (which is basically a file) will be created at each refresh. A very short refresh interval could be also be contributing to more write IOPS.

1 Like

The older version has already been taken offline but last week I changed this setting from 5s to 30s but this didn't really change anything for us.

Thank you for your help

What do you mean with larger size on disk because with the new ELK cluster we use a lot less disk space for more data than before
And what differences in Mapping could cause higher write IOPS?

Thank you for your help

Good point. There is more data going into doc values, which increase the size of those data types, but at the same time you no longer index a _all field, which will save a lot of space. If you use the best compression codec you could save even more space.

One thing that has changed between 1.x and 6.x is that the transaction log is synced to disk much more frequently in order to improve resiliency, but I am not sure what impact this could have on the reported IOPS.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.