Elasticsearch snapshot repository fails validation

Yes 1/200 = 5/1000 = 0.005. It appears we are in agreement on how math works. In any case, I think a lot of this time is spent in the SSH connection anyway, so I took your earlier advice:

I spent yesterday afternoon writing a very simple client/server program: GitHub - amfournda/manualfiletester. The server listens on a port and waits for a client to connect. When a client connects it writes a short random string to a file, then sends that random string to the client which opens the file and prints both the string it received over the network and the contents of the file. In my case this file is on the same shared directory elasticsearch is trying to use for snapshots. It works perfectly:

Server:

root@facility-es02:~/filetest# ./server
Took 0 seconds and 23576 microseconds to write to manualtest: SBQm91FtIkP3Rk4g
Took 0 seconds and 377 microseconds to sent string to client: SBQm91FtIkP3Rk4g
Took 0 seconds and 31993 microseconds to write to manualtest: ktqh4Etf3hMW2Mt2
Took 0 seconds and 66 microseconds to sent string to client: ktqh4Etf3hMW2Mt2
Took 0 seconds and 29797 microseconds to write to manualtest: pGDbmlBeWNqsV28b
Took 0 seconds and 79 microseconds to sent string to client: pGDbmlBeWNqsV28b

Client:

root@facility-es01:~/filetest# ./client
Took 0 seconds and 25376 micros to read string from server: SBQm91FtIkP3Rk4g
Took 0 seconds and 2215 micros to see file contains SBQm91FtIkP3Rk4g
root@facility-es01:~/filetest# ./client
Took 0 seconds and 32622 micros to read string from server: ktqh4Etf3hMW2Mt2
Took 0 seconds and 1651 micros to see file contains ktqh4Etf3hMW2Mt2
root@facility-es01:~/filetest# ./client
Took 0 seconds and 30944 micros to read string from server: pGDbmlBeWNqsV28b
Took 0 seconds and 1539 micros to see file contains pGDbmlBeWNqsV28b

As you can see, the state of the disk is always consistent with what gets sent over the network no matter how many times I run this. So long as the client calls close() on a file before another client tries to access that file it will exist and contain the expected information every single time. Will you agree that this proves my NFS server is guaranteeing close-to-open consistency for files, which is all NFS should be expected to do?

Hmm that is strange indeed. Would you run this on the master and the problematic data node:

strace -ttt -s256 -f -p $PID -etrace=open,write,close,fsync,mkdir,rename,read -o strace-$PID.log

Then execute a repo verify, then Ctrl+C both strace processes, and finally share the logs? I can arrange a private channel for sharing files if you have concerns about leaking something secret.

Oh, if possible, could you do this on a cluster that's not doing anything elase. If this is your production cluster then would you try and reproduce it on a separate test cluster that only exists for these tests?

I'll message you the log files on the ES Slack.

For anyone following along, the problem actually has to do with SystemD implementing a per-process /tmp space that take precedence over anything else that might be mounted in that space. As you can see from grepping my elasticsearch's process mounts, it is given it's own /tmp dir:

root@facility-es01:~# grep /tmp /proc/11500/mounts
/dev/vda1 /tmp ext4 rw,relatime,errors=remount-ro 0 0
/dev/vda1 /var/tmp ext4 rw,relatime,errors=remount-ro 0 0
backup0.ftc:/pool0/elasticsnapshots /tmp/elastic nfs rw,sync,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.40.8.12,mountvers=3,mountport=36300,mountproto=udp,lookupcache=none,local_lock=none,addr=10.40.8.12 0 0

This meant that any writes to /tmp/elastic/ were not going to my NFS mount, as that mount was being clobbered by the per-process /tmp feature of SystemD. Moving my NFS mount into /mnt/ (where it probably should be anyway) fixes this issue and snapshot repo verification works properly:

root@facility-kibana:~# curl -s -XPOST 'localhost:9200/_snapshot/backup0/_verify'| jq
{
  "nodes": {
    "_AAzUpbdSLmH8Eu2fATC8w": {
      "name": "facility-es03.ftc"
    },
    "PFSVaur8STWZap533rLEXg": {
      "name": "facility-es01.ftc"
    },
    "gUK92CyURGKaIh1AYYxFUA": {
      "name": "facility-es02.ftc"
    }
  }
}

I want to thank David Turner for his time looking into this issue and I appreciate his expertise.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.