as my other post explained our 3 Node cluster fried 2 SSDs in 4 hours. We are currently investigating what really happened.
Our System:
3 Nodes, Elastic 7.17.3, Kibana 7.17.3, Every node has a 2TB SSD and 32GB of Ram, Node 1 Receives all the Data and moves it to the rest. The Drives are ONLY used by Elastic, nothing else is stored on them. Elastic has 16GB of ram for itself, no other tasks are running on the machines. The machines are physical, not virtual.
We now have the cluster back up again and every disk on every Node has 1MB/s of write constantly. But we only index 50Kbyte/s at max.
Elastic or whatever is writing 850.000Bytes/s (830Kbyte/s). Thats 20x times more than our actual indexing speed. This comes down to a disk usage of 4GB/hour. Our biggest index of our 5 indexes is 8GB and it has collected data for multiple months. So this is INSANE.
Main usage:
D:$LogFile
D:$Mft
2 of our indexes but the written bytes match the expected data usage
sometimes a huge pagefile comes up but is gone in seconds
We experience this on every disk and on every node. We think that this is the main issue why our SSDs fail so quick, they are at 20-40% of there write capacity all the time.
To make this even more interesting Node 3 receives about 50KB/s of data over ethernet, but the disk is writing 10-15x times more (800-1000KB/s). Where does this extra data even come from?
When we turn elastic of it goes down to 0 and this is why we think it is an elastic problem and not a windows problem.
Can somebody explain why this happens and how we can prevent it? We dont know what is causing this and it is killing our cluster, for the second time now, our SSDs are at 40% health already. We dont know what to do. Thanks!
TL;DR
We index 50KB/s but all drives are writing 20Times more (1000KB/s).
This coincides with another user's post from a couple of weeks back:
No idea what's going on here, sorry, Elasticsearch isn't directly interacting with the $LogFile or $Mft files so it's unclear why they're seeing so much churn. I've asked some of our Windows experts to take a look.
Both $LogFile and $Mft are parts of Windows NTFS, so while they're associated with usage of Elasticsearch, it is really Windows/NTFS doing this work, not Elasticsearch. If you look at your breakdown, Elasticsearch is really using roughly 70KB/s (the last 2 lines of 7028 PID). (While slightly higher than 50KB/s, still within a reasonable range of it. The rest of the usage is just NTFS doing stuff for the Elasticsearch process. This probably comes down to something on the Windows/NTFS side being the issue.
I'd suggest you search online for stuff like "$LogFile causing high disk usage" or "$Mft cause high disk usage" and see what other people are talking about/suggesting. Doing a quick search, myself, I found What Is NTFS Volume Log ($Logfile) And Why Is It Causing Disk Activity In Windows 7 which seems to provide some suggestions (note: I have not tried these suggestions myself, and would advise caution if you use them; always test in something other than production if possible )
Another question, do you have any custom (non-Default) settings set on the Elasticsearch side? While I wouldn't expect any Elasticsearch settings to cause this issue, knowing non-Default settings might provide a bit more insight.
We do not. Everything is stock ELK. We get the data from Redis, transport it to the cluster via Logstash, the cluster is configured with the normal .yml.
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data
# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
-Xmx16318m
-Xms16318m
Yea I also think its in a reasonable range but still a bit much.
Some indices have higher writes than their actual size. dN_O0 is the .kibana_task_Manager index and writes 250MB/hour even though its only 70MB big. Same for tSV8Y which is metricbeat. Its only 130MB big but writes twice as much in an hour.
I posted the question in a windows forum too, lets see if they have ideas.
I think that's expected, this index tracks the state of the managed tasks so will mostly see update traffic (i.e. writes which do not increase its size). Still 250MB/hr is pretty small, doesn't seem like it should be a problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.