Moving shapshot between computers (again)

I know things like this have been discussed here, but I do not find any complete answer that matches the situation or what I observe on the screen, so I have to ask again.

If I make a snapshot of an index on one computer, how do I get it to a different computer using a file system that is not shared?

On a development machine and production machine I have installed elasticsearch and kibana from the exact same jars and configured them with the same users. The slight difference is that their unrelated filesystems are structured differently so that one has a path.repo of /home/me/backups and the other uses /work/me/backups. My elasticsearch user is a superuser.

On the development machine I create an index, repository, and snapshot, the latter two with kibana. On the production machine I create the repository with kabana. Both repos have the same name. I copy (sftp) all the files from /home/me/backups on the development machine to /work/me/backups on the production machine. I go to snapshot and restore on the production machine and see the repository I had created, but there are no snapshots, of course. It doesn't know about the files I just copied over.

How do I tell it (the production machine) about the snapshot so that it can be restored or what else might need to happen order to use the index?

Hi @kwalcock and welcome! I think you're looking for these docs:

In particular:

When restoring a repository from a backup, you must not register the repository with Elasticsearch until the repository contents are fully restored. If you alter the contents of a repository while it is registered with Elasticsearch then the repository may become unreadable or may silently lose some of its contents.

Thanks for the valiant attempt, but it doesn't get me further.

"When restoring a repository from a backup, "

I guess that backup here might be the same as a snapshot.

"you must not register the repository with Elasticsearch"

Did Kibana register the repository for me? I believe to have tried it without telling the production machine about a repo via Kibana, but then it (Kibana) didn't know what to do and could not find the repo or the snapshot.

"until the repository contents are fully restored."

OK. But how does one even start to restore the repo (snapshot)?

"If you alter the contents of a repository while it is registered with Elasticsearch then the repository may become unreadable or may silently lose some of its contents."

Nobody is altering anything other than by trying to restore the snapshot.

From the nearby page "Restore a snapshot",

"GET _snapshot"

{
  "<reponame>": {
    "type": "fs",
    "settings": {
      "location": "<repolocation>"
    }
  }
}

shows the repo that I created with Kibana. IIRC it didn't show anything before I created it.

"GET _snapshot//*?verbose=false"

shows no snapshots, which matches what Kibana tells me. So, I remain with "How do I tell the production machine about the snapshot so that it can be restored?" or some variation like "How do I restore it so that elasticsearch knows about it?"

You need to register the repository with Elasticsearch after you've copied all the files over.

i.e. do these steps in the other order:

You need to do step 2 before step 1.

1 Like

Thanks! That worked.

I deleted the repo with Kibana and then recreated it and the snapshot showed up. I did some manipulation of the associated directory so that no files would be deleted along with the repo, but that didn't seem to affect anything.

The problem was mostly with the idea the that repo location is something that is monitored for the addition of new snapshots. That doesn't seems to be the case for snapshots on the file system. If I made a second snapshot from the development machine and moved it to the production machine, the production machine probably wouldn't realize it is there. I would need to delete the repo and recreate it in order for it to check. Is that correct?

Right, otherwise that counts as "altering the contents of a repository while it is registered with Elasticsearch" which the docs I linked say very much not to do. If Elasticsearch is writing to the repo at the same time as you're doing stuff to it, chaos will ensue.

If you want to keep the repo up to date by repeatedly updating its contents with sftp, an alternative that'd work better would be to tell Elasticsearch that the repo contents are owned by something external (i.e. your sftp process) and that it shouldn't touch them, by adding the setting readonly: true (not sure if that's possible in Kibana but it's an option in the underlying API). If you do that then ES will re-scan its contents each time you access it. Note that you might still get transient errors if ES happens to be trying to read it while you're updating its contents, but they'll go away on a retry.

I will try that. Thanks!