I switched to using the new 6.0 ELK stack this morning and when loading data (I started from a brand new instance, rather than trying to migrate) I noticed that I was running out of disk space on my server running ElasticSearch.
After doing some digging around for a while the problem appears to be that translog files are not being automatically cleaned up when I use the v6.0 stack but instead stay present within the directory which holds the index.
As a comparison, I loaded 5 days worth of records from my system into a 5.6.3 ELK environment and then did the same for a 6.0 ELK environmment.
Within the 5.6.3 environment, whilst Logstash was feeding records into Elasticsearch the translog files were pretty large. However, a period of time after Logstash finished (not sure what it was - 60 seconds maybe) these translog files disappeared, and the total storage used by Elasticsearch pretty much halved.
The screenshot below shows the folder size mapping for my Elasticsearch 5.6.3 area after this translog removal occured:
Compare this to the Elasticsearch 6.0 area after the same amount of time following Logstash completion (actually I waited a good 10-15 minutes and nothing changed):
Has anyone else encountered this issue? At the moment I'll need to hold off migrating fully to the 6.0 stack as I'd just run out of disk storage under normal operation.
So in 5.6.3 these files used to disappear entirely after a certain period. I'm fine with the same behaviour in 6.0. Any idea how I configure 6.x to work this way?
Or alternatively (and maybe better, to take advantage of the new functionality), how do I tune this so that it doesn't use as much storage (at the moment it's using pretty much the same amount as my data).
There are two settings mentioned in the blog post that you can tune down, although it is likely to make recovery slower. Whether this matters will depend on your cluster size and data volumes.
Hi - I forgot to add that I'm working on only a single node (this is not a PROD environment). Given the blog post talks about synchronization across multiple nodes (which I don't have) is this perhaps the reason why these files never reduce?
...I'll have a proper read of the blog post now but just wanted to make the above clear.
OK, I'll tweak the settings for my current installation. The blog post is useful - and mentions the issue of siginficantly increased disk space - thanks for pointing me to it.
I only seem to be able to set these parameters on existing indexes, whereas I'd like to configure Elasticsearch so that when it creates a new index it automatically has the parameters applied as below:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.