We use a graylog with 10 elastic node backend.
ES version: elasticsearch-6.6.0 (same problem with prev versions also)
Snapshot repo: local "fs" repo, NFS mounted to a local folder
OS: CentOS 7
10 elastic nodes in one cluster, no set roles.
The problem:
We run snapshots every night from a cron job. Before the job, it delete the old ones. So we have a constant disk usage. We make snapshots per index.
But if we update a host OS (ES version not changes), and reboot it (eg. kernel update) the night snapshot jobs starts, and it eat all space, make all snapshots from the begining. So write the data to the disk again.
If we don't update nodes, it can run months without any problem.
ES starts automatically, but the NFS mounts with hand. So after the restart the ES starts with empty repo. We also tried to restart ES after the mount.
I tried to check logs, but I didn't see any errors, but there is a lot of logs, so I'm not sure, I'm right.
The restarted nodes' logs are empty at the time when the snapshots starts.
Have you got any idea where to start the debugging? Or have you seen same error before?
I did a little debug, and I found some new information.
The process creates the snapshots from the beginning where the the restarted node (service restart enough) has the replica shard of the index.
I tried to set the following, but I get only INFO level logs about the snapshots on the cluster master node's log.
So delete the old snapshots not all snapshots. The frontend app rotates the indexes, and I delete the 5 day old snapshots. So on the current active indices I have 4-5 snapshots.
Eg.: Index A - snapshot 0901,0902,0903,0904 (date, month, day)...
And my problem. With your word it is more simple to tell it.
The snapshots working well, it does a full at first time, and incremental after. Because the frontend rotate the active write index, most of the snapshots runs only 1-2 secs.
BUT if I restart one server, when I run my script, it does incremental snapshots.
EXCEPT the restarted server's indices (where one replica shard of an index is on the restarted server), where the elastic start a Full snapshot instead an incremental one.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.