Conundrum with repositories

when I query my repositories using the API I get a list of the repositories as expected:

http://localhost:9200/_snapshot/*?pretty

{
  "version6" : {
    "type" : "fs",
    "settings" : {
      "location" : "/data/elasticsearch/backups/version6"
    }
  },
  "version7" : {
    "type" : "fs",
    "settings" : {
      "compress" : "false",
      "location" : "/data/elasticsearch/backups/version7"
    }
  },
  "daily" : {
    "type" : "fs",
    "settings" : {
      "compress" : "true",
      "location" : "/data/elasticsearch/backups/daily"
    }
  }
}

but when I try and access the repository ES says it does not exist:

http://localhost:9200/_cat/snapshots/daily?pretty

{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_missing_exception",
        "reason" : "[daily] missing"
      }
    ],
    "type" : "repository_missing_exception",
    "reason" : "[daily] missing"
  },
  "status" : 404
}

this is the same for all the listed repositories.

The repositories are shared filesystem and I recently moved everything to a different system with a lot more disk space. I verified that all looked OK and then created the daily repository and did a full backup which worked. A week later I find the situation above.

Any ideas what is going on?

I had a light bulb moment.

It had to be the path.repo and when I looked at the configuration it was empty. Caused by an issue with Puppet configuration.

What confused me was that parts of the system seemed to know about the repos and other parts did not. If the repositories had completely vanished I am pretty sure I would have fingered the problem much sooner!

1 Like

Agreed that this isn't great; I would expect to see errors in the logs in this situation too.

The discrepancy is because GET _snapshot/* returns a list of repositories according to the cluster config, regardless of whether that config is ok or not, whereas GET _cat/snapshots/daily is trying to list the snapshots within the repository called daily, so it's closer to GET _snapshot/daily/*, i.e. the get snapshots API.

I think you can only get into this state if you start out with a valid config and then you move to a different system where it's not valid any more - because you can do this kind of move in a rolling fashion there's not really a point at which we can block your progress to let you know something is wrong.

what happened was that (due to my fat fingers :wink: puppet pushed a yml file with path.repos empty, there may well have been errors I missed in the logs but as we know it is really difficult to find meaningful stuff in the logs with all the stack traces. Sigh...

BTW is there a way of getting ES to load a new configuration file without restarting the node?

Puppet pushes out the changes and I used to restart the nodes from puppet but this happens at random intervals and I soon learnt this is a really bad idea and you can get the two master eligible nodes down at the same time : (

As a general rule, any stack trace in the logs is meaningful - at least if it isn't then I'd count it as a bug (we have definitely fixed such bugs in the past).

No, sorry, a rolling restart is needed for changes to the config file. That's a deliberate feature for things like path.repo since that setting influences the security manager and we definitely don't want to be able to reconfigure the security manager at runtime.

Oh, I know that the stack traces are necessary for the diagnostics but they make the logs really hard to scan when you don't know what you are looking for! The "something went wrong is there anything unusual in the logs" scenario and you get one message on the screen at a time!

I must start piping the logs through cut for the first pass!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.