Best practice for creating an index when an ES docker container starts

When running an initialization script to create a new index on startup, authentication fails with default username and password (elastic:changeme). Waiting a few seconds then manually running the exact same command works perfectly. My current workaround is to call the health status endpoint in a loop until the request doesn't fail from an authentication error, then I run my index init script. This feels hacky and I'm wondering if there's a better way to create an index automatically when the container starts.

Elastic Search Version
Version: 5.4.0, Build: 780f8c4/2017-04-28T17:43:27.229Z

Plugins
analysis-icu

JVM
JVM: 1.8.0_131

OS
Linux 4464d613c8d6 4.9.60-linuxkit-aufs #1 SMP Mon Nov 6 16:00:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Inside official Docker image for 5.4, though with a different docker-entrypoint.sh script

Steps to reproduce:
Dockerfile

FROM docker.elastic.co/elasticsearch/elasticsearch:5.4.0
RUN bin/elasticsearch-plugin install analysis-icu

COPY docker-entrypoint.sh /docker-entrypoint.sh
COPY config/elasticsearch.yml config/elasticsearch.yml
COPY config/setup.sh config/setup.sh
RUN mkdir utils
COPY utils/wait-for-it.sh utils/wait-for-it.sh

USER root
RUN chmod +x /docker-entrypoint.sh utils/wait-for-it.sh config/setup.sh
RUN chown -R elasticsearch:elasticsearch /docker-entrypoint.sh utils/wait-for-it.sh config/setup.sh

USER elasticsearch
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["/usr/share/elasticsearch/bin/elasticsearch"]

elasticsearch.yml

cluster.name: "docker-cluster"
network.host: 0.0.0.0

# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1

# By default the low water mark threshold will stop the node from starting 
# if your dev machine doesn't have at least 15% free. This is generally
# less of a mission critical threshold when in Docker on a dev machine
cluster.routing.allocation.disk.threshold_enabled: false

docker-entrypoint.sh

#!/bin/sh

# wait for Elasticsearch to start, then run the setup script to
# create and configure the index.
exec /usr/share/elasticsearch/utils/wait-for-it.sh localhost:9200 -- /usr/share/elasticsearch/config/setup.sh &
exec $@ 

setup.sh

#!/bin/sh

echo Initiating Elasticsearch Custom Index
# move to the directory of this setup script
cd "$(dirname "$0")"

# for some reason even when port 9200 is open Elasticsearch is unable to be accessed as authentication fails
# a few seconds later it works
until $(curl -sSf -XGET --insecure --user elastic:changeme 'http://localhost:9200/_cluster/health?wait_for_status=yellow' > /dev/null); do
    printf 'AUTHENTICATION ERROR DUE TO X-PACK, trying again in 5 seconds \n'
    sleep 5
done

# create a new index with the settings in es_index_config.json
curl -v --insecure --user elastic:changeme -XPUT '0.0.0.0:9200/test?pretty' -H 'Content-Type: application/json' -d @es_index_config.json

wait-fot-it.sh

logs:

[WARN ][o.e.x.s.a.AuthenticationService] [TL7bL2I] An unexpected error occurred while attempting to authenticate [elastic] against realm [reserved]
org.elasticsearch.ElasticsearchSecurityException: failed to authenticate user [elastic]
...
curl: (22) The requested URL returned error: 401

Then after the loop in setup.sh waits 5 seconds

[2018-04-03T02:00:17,596][INFO ][o.e.l.LicenseService     ] [TL7bL2I] license [bafce55a-1a94-43a4-b3c2-827ec185e237] mode [trial] - valid
* About to connect() to 0.0.0.0 port 9200 (#0)
*   Trying 0.0.0.0...
* Connected to 0.0.0.0 (0.0.0.0) port 9200 (#0)
* Server auth using Basic with user 'elastic'
1 Like

There's a bunch of stuff underlying your question, some of which we can solve.

Firstly, the primary underlying cause of this is that Elasticsearch is a distributed system (or more specifically the way in which ES handles its distributed nature).
When a node starts up it needs to determine whether it should join a cluster that already exists, or whether it should form a new cluster. Once that has been determined, it will attempt to establish a functioning cluster and recover the state of that cluster, and any existing data.

A node may be up and running, but it's not useful until it gets to that point.

But all of those things take an indeterminate amount of time. For example you might start up a data-only node (node.master: false), and that node cannot tell you which indices exist or don't exist, or what the state of the cluster is until it can join a cluster that has a master node. And since that is dependent on another node becoming available on a different server somewhere on the network, it's impossible to predict when that happens.

In theory, we could provide a mechanism by which you could tell that node "Hey, run these commands once you join a functioning cluster". But there's all sorts of edge cases there.

  • What do we do if the cluster is up, but some indices are unavailable? If you're trying to insert data into those indices, then you'd want to wait. But if you're trying to create a new index, then you might want it to run straight away.
  • If the node gets shutdown again before it joins a cluster should it keep track of the commands its supposed to run? Are they still meaningful at that point?
  • What do we do if those commands fail? How do you find out whether they succeeded? etc.

It's not unsolvable, but it's easy to do badly and incredibly hard to do well, so it's not a problem that we've chosen to tackle.

The second point of complexity that you're running into is that the "reserved realm" (which includes the elastic user) stores its state in a special index inside the cluster. This means that it is able to share state across the whole cluster (e.g. it has the same password on all nodes) but it also means that it is impossible to authenticate until that index is available. By direct implication, a node that has not yet formed a cluster cannot authenticate the elastic user.

If we can solve that second problem, then it opens up some opportunities to tackles the first problem.

What you should be able to do, is use the File Realm to create a file-based superuser. Because that user is stored in files local to the node, it is available even if a cluster has not yet formed.
So include this in your dockerfile:

RUN bin/x-pack/users useradd -r superuser -p changeme admin

And then you should be able to access the node as soon as it starts (well, as soon as the HTTP port is opened) with -u admin:changeme

Then, instead of polling you can do a single Cluster Health check, with wait_for_status=yellow (or one of the other wait_for_xxx options if you prefer).

2 Likes

Thanks for the thorough answer Tim!

Adding the file-based super user sounds like the ticket, my existing script was using the wait_for_status so with the new credentials things should just work.

I'm having difficulty setting up this user though. I included the line you recommend in my Docker file but the credentials aren't working. If I run the command bin/x-pack/users list I'm told there are no users. Looking in the /usr/share/elasticsearch/config/x-pack directory I see the following:

ls -l
total 20
-rw-rw---- 1 elasticsearch elasticsearch 966 Jun 14  2017 log4j2.properties
-rw-rw---- 1 elasticsearch elasticsearch 473 Jun 14  2017 role_mapping.yml
-rw-rw---- 1 elasticsearch elasticsearch 197 Jun 14  2017 roles.yml
-rw-rw---- 1 elasticsearch elasticsearch   0 Jun 14  2017 users
-rw------- 1 elasticsearch elasticsearch  67 Apr  4 20:57 users4649812597072794813tmp
-rw-rw---- 1 elasticsearch elasticsearch   0 Jun 14  2017 users_roles
-rw------- 1 elasticsearch elasticsearch  16 Apr  4 20:57 users_roles4361431273720912395tmp

As you can see the users file has no content, but a temp file was created. On inspection the users and users_roles temp files have the correct details for the new user. So it seems the changes aren't being committed into the proper config files. Any suggestions? Some other details that may be useful:

curl -XGET -u elastic 'localhost:9200/_xpack/usage?pretty'
Enter host password for user 'elastic':
{
  "security" : {
    "available" : true,
    "enabled" : true,
    "realms" : {
      "file" : {
        "name" : [
          "default_file"
        ],
        "available" : true,
        "size" : [
          0
        ],
        "enabled" : true,
        "order" : [
          2147483647
        ]
      },

Ah, yes, the joy docker filesystem semantics.

I assume you're particularly tied to ES 5.4.x ?
The latest 5.6 versions have completely rewritten that file handling code to avoid these sorts of issues (although it may just fail in a more explicit way).

I don't have a great solution for you I'm afraid.
The simplest option (but still a bit of a hack) is probably to delete the existing users and users_roles files before you run useradd - my best guess is that the rename is failing while trying to either get the properies (stat) of the existing file, or is being blocked from overwriting the existing file.

1 Like

Ahhh, unfortunately I need to use 5.4.x for the moment due to a vector scoring plugin that hasn't been updated yet. If the proof of concept works out I'll look into contributing to update the plugin.

In the mean time I'll try the deleting approach. TBH for this prototype I can just create the index from a python notebook as needed, but I feel I have an approach if I need to auto create an index in the future for ES > 5.6.x

Thanks for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.