Lost es snapshots


(Tim Dunphy) #1

Hey guys,

I'm running elasticsearch 1.7.1. And I'm taking snapshots of the cluster using this cron job:

0 8  *  * Sat /bin/curl --user admin:$ES_PASS -XPUT "http://logs.example.com:9200/_snapshot/jf_backup/snapshot_$(date +%Y-%m-%d)

Yet, when I try to list my backups with the following command, I get a curious result:

    [root@logs:/etc/elasticsearch] #curl  -XGET 'http://logs.example.com:9200/_snapshot'
    {}

It appears that backups are in fact happening merely because the mount point I established for the backups is filling up:

[root@logs:/etc/elasticsearch] #grep path.repo elasticsearch.yml
path.repo: ["/mnt/backup", "/mnt/backup/jf_backup"]

[root@logs:~] #du -sh /mnt/backup/jf_backup
2.8G    /mnt/backup/jf_backup

So I'm just wondering what I'm getting wrong here? Any advice on this problem would be welcomed!

Thanks


(Mark Walkom) #2

Does it mention anything in the logs?
Does /_snapshot/_all show anything else? Can you verify the repo?


(Tim Dunphy) #3

Warkolm,

Nope!! No action in the logs at all when I hit the server with that snapshot command. I was tailing the logs, and put some distance at the end by hitting return a few times. Then issued the command again. Nothing happened in the logs!

And /_snapshot/_all shows the same thing as my OP

[root@logs:~] #curl -XGET 'http://localhost:9200/_snapshot/_all'
{}

Ran that command from the first ES server itself.

And if I try and verify the backup I get a 404:

#curl -XPOST 'http://localhost:9200/_snapshot/jf_backup/_verify?pretty=true'
{
  "error" : "RepositoryMissingException[[jf_backup] missing]",
  "status" : 404
}

But if that's truly the case, and there are no backups, then why would the backup directory have data in it?

[root@logs:/etc/elasticsearch] #grep path.repo elasticsearch.yml
path.repo: ["/mnt/backup", "/mnt/backup/jf_backup"]

[root@logs:/etc/elasticsearch] #du -sh /mnt/backup/jf_backup/
2.8G    /mnt/backup/jf_backup/

And I can see this in the backup directories:

[root@logs:/etc/elasticsearch] #ls -lh /mnt/backup/jf_backup/
total 12K
-rw-r--r--.  1 elasticsearch elasticsearch    0 Aug 23 00:27 index
drwxr-xr-x. 55 elasticsearch elasticsearch 4.0K Aug 23 00:23 indices
-rw-r--r--.  1 elasticsearch elasticsearch 5.1K Aug 23 00:23 metadata-snapshot_1
-rw-r--r--.  1 elasticsearch elasticsearch    0 Aug 23 08:32 metadata-snapshot_2
-rw-r--r--.  1 elasticsearch elasticsearch    0 Aug 23 00:27 snapshot-snapshot_1
-rw-r--r--.  1 elasticsearch elasticsearch    0 Aug 23 08:32 snapshot-snapshot_2

I'm not certain why that data's there at all if ES seems to think that there are no backups to be found? Odd stuff!

Thanks,
Tim


(Mark Walkom) #4

Those are from August though, perhaps someone setup a repo previously, ran some backups and then removed the repo?


(Tim Dunphy) #5

Quite possibly. But it appears that the backups had gotten a little messed up! So I decided to scrap what was there and just start again. I just rm'd everything in the backup direcotory and bounced all nodes in the cluster. Then re-created the repo.

curl --user admin:$ES_PASS -XPOST "http://logs.example.com:9200/_snapshot/jf_backup" -d'
{
    "type": "fs",
         "settings": {
          "location": "/mnt/backup/jf_backup/"
    }
}'

Set some throttling options:

curl --user admin:secret -XPOST "http://logs.example.com:9200/_snapshot/jf_backup" -d'
{
    "type": "fs",
        "settings": {
        "location": "/mnt/backup/jf_backup",
        "max_snapshot_bytes_per_sec" : "50mb",
        "max_restore_bytes_per_sec" : "50mb"
    }
}'

Then listed the backup repo:

curl -XGET 'http://logs.example.com:9200/_snapshot/_all?pretty=true'
{
  "jf_backup" : {
    "type" : "fs",
    "settings" : {
      "location" : "/mnt/backup/jf_backup",
      "max_restore_bytes_per_sec" : "50mb",
      "max_snapshot_bytes_per_sec" : "50mb"
    }
  }
}

Took a new backup:

curl --user admin:$ES_PASS -XPUT "http://logs.example.com:9200/_snapshot/jf_backup/snapshot_$(date +%Y-%m-%d)"
{"accepted":true}

And was able to verify that the backup had happened:

#curl -XPOST 'http://logs.example.com:9200/_snapshot/jf_backup/_verify?pretty=true'
{
  "nodes" : {
    "95AQx2aiRp64m7S5VlLa4g" : {
      "name" : "JF_ES3"
    },
    "QS4xFC8DRiSVo9wZE3wTZA" : {
      "name" : "JF_ES1"
    },
    "VjxijoX4S1ySOkKPPSYztw" : {
      "name" : "JF_ES2"
    }
  }
}'

Thanks


(system) #6