Duplicate fleet instances after Docker compose restart

We're using a single compose file for our on-prem Elastic Stack, which includes a fleet server as well as a default agent (currently running APM).

In general, this works and starts up nicely, BUT on every restart of the stack it keeps adding new agent instances. See screenshot below.

It's important for me to stress that, barring the duplicated entries, the whole thing works (the new instances work as expected and I can easily remove the old ones, even if this is something I need to manually do myself). It's just that the duplication issue makes me thing we're still doing something wrong, even if it's something minor.

The compose is as follows.

version: "3.9"

services:
  setup:
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    group_add: 
      - '1002'
    volumes:
      - /mnt/elastic_data/elastic_search/certs:/usr/share/elasticsearch/config/certs
    user: "0"
    command: >
      bash -c '
        if [ x${ELASTIC_PASSWORD} == x ]; then
          echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
          exit 1;
        elif [ x${KIBANA_PASSWORD} == x ]; then
          echo "Set the KIBANA_PASSWORD environment variable in the .env file";
          exit 1;
        fi;
        if [ ! -f config/certs/ca.zip ]; then
          echo "Creating CA";
          bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
          unzip config/certs/ca.zip -d config/certs;
        fi;
        if [ ! -f config/certs/certs.zip ]; then
          echo "Creating certs";
          echo -ne \
          "instances:\n"\
          "  - name: elastic-search\n"\
          "    dns:\n"\
          "      - elastic-search\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: kibana\n"\
          "    dns:\n"\
          "      - kibana\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: apm\n"\
          "    dns:\n"\
          "      - apm\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: fleet-server\n"\
          "    dns:\n"\
          "      - fleet-server\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;
          bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
          unzip config/certs/certs.zip -d config/certs;
        fi;
        echo "Setting file permissions"
        chown -R root:root config/certs;
        find . -type d -exec chmod 750 \{\} \;;
        find . -type f -exec chmod 640 \{\} \;;
        echo "Waiting for Elasticsearch availability";
        until curl -s --cacert config/certs/ca/ca.crt https://elastic-search:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Setting kibana_system password";
        until curl -s -X POST --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://elastic-search:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
        echo "All done!";
      '
    healthcheck:
      test: ["CMD-SHELL", "[ -f config/certs/elastic-search/elastic-search.crt ]"]
      interval: 1s
      timeout: 5s
      retries: 120

  elastic-search:
    depends_on:
      setup:
        condition: service_healthy
    restart: unless-stopped
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    group_add: 
      - '1002'
    volumes:
      - /mnt/elastic_data/elastic_search/certs:/usr/share/elasticsearch/config/certs
      - /mnt/elastic_data/elastic_search/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      - /mnt/elastic_data/elastic_search/data:/usr/share/elasticsearch/data
      - /mnt/elastic_data/elastic_search/logs:/usr/share/elasticsearch/logs
    ports:
      - ${ELASTIC_PORT}:9200
    environment:
      - node.name=main-node
      - cluster.name=<REDACTED>
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.authc.api_key.enabled=true
      - xpack.security.http.ssl.key=certs/elastic-search/elastic-search.key
      - xpack.security.http.ssl.certificate=certs/elastic-search/elastic-search.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/elastic-search/elastic-search.key
      - xpack.security.transport.ssl.certificate=certs/elastic-search/elastic-search.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  kibana:
    depends_on:
      elastic-search:
        condition: service_healthy
    restart: unless-stopped
    image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
    group_add: 
      - '1002'
    volumes:
      - /mnt/elastic_data/elastic_search/certs:/usr/share/kibana/config/certs
      - /mnt/elastic_data/kibana/data:/usr/share/kibana/data
    ports:
      - ${KIBANA_PORT}:5601
    environment:
      - SERVER_NAME=<REDACTED>
      - ELASTICSEARCH_HOSTS=https://elastic-search:9200
      - ELASTICSEARCH_USERNAME=<REDACTED>
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}    
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=UeHKV4ajMoTXqxuuVrsbyq7DRL3nwrN8
      - SERVER_SSL_ENABLED=true
      - SERVER_SSL_CERTIFICATE=config/certs/kibana/kibana.crt
      - SERVER_SSL_KEY=config/certs/kibana/kibana.key
      - SERVER_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - SERVER_PUBLICBASEURL=<REDACTED>
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s --cacert config/certs/ca/ca.crt -I https://localhost:5601 | grep 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  fleet-server:
      hostname: fleet-server
      group_add: 
        - '1002'
      depends_on:
        kibana:
          condition: service_healthy
        elastic-search:
          condition: service_healthy
      image: docker.elastic.co/beats/elastic-agent:${STACK_VERSION}
      volumes:
        - /mnt/elastic_data/elastic_search/certs:/certs
      healthcheck:
        test: "curl -s --cacert /certs/ca/ca.crt https://127.0.0.1:8220/api/status | grep 'HEALTHY'"
        retries: 12
        interval: 5s
      entrypoint: [sh, -c, "until curl -s --cacert /certs/ca/ca.crt -I https://kibana:5601 | grep 'HTTP/1.1 302 Found'; do sleep 10; done && /usr/bin/tini -- /usr/local/bin/docker-entrypoint"]
      ports:
        - ${FLEET_PORT}:8220
      restart: unless-stopped
      user: root
      environment:
        - KIBANA_HOST=https://kibana:5601
        - KIBANA_CA=/certs/ca/ca.crt
        - KIBANA_USERNAME=<REDACTED>
        - KIBANA_PASSWORD=${ELASTIC_PASSWORD}
        - ELASTICSEARCH_HOST=https://elastic-search:9200
        - ELASTICSEARCH_USERNAME=elastic
        - ELASTICSEARCH_PASSWORD=${ELASTIC_PASSWORD}
        - ELASTICSEARCH_CA=/certs/ca/ca.crt
        - KIBANA_FLEET_SETUP=true
        - FLEET_SERVER_ENABLE=true
        - FLEET_SERVER_HOST=0.0.0.0
        - FLEET_SERVER_PORT=8220
        - FLEET_SERVER_CERT=/certs/fleet-server/fleet-server.crt
        - FLEET_SERVER_CERT_KEY=/certs/fleet-server/fleet-server.key
        - FLEET_URL=https://fleet-server:8220
        - FLEET_CA=/certs/ca/ca.crt
        - FLEET_ENROLL=false

  default-agent:
      hostname: default-agent
      depends_on:
        fleet-server:
          condition: service_healthy
      image: docker.elastic.co/beats/elastic-agent:${STACK_VERSION}
      entrypoint: [sh, -c, "until curl -s <REDACTED>| grep 'HEALTHY'; do sleep 10; done && /usr/bin/tini -- /usr/local/bin/docker-entrypoint"]
      expose:
        - 8200
      restart: unless-stopped
      user: root
      environment:
        - FLEET_ENROLLMENT_TOKEN=<REDACTED>
        - FLEET_URL=<REDACTED>
        - FLEET_ENROLL=true

Some public URLs or specific names were redacted. Note that the whole setup is using nginx with LE to publish certain URLs using https://github.com/nginx-proxy/acme-companion/blob/main/docs/Docker-Compose.md. The configuration sections for this have been removed from the above and all remaining URLs are internal to docker.

Two questions:

a) The above compose file is... messy - it's been slapdashed and tweaked on the fly whilst trying to make everything work and could definitely be improved and cleaned up vs what is currently there. And then when it finally started to work I'm a bit scared to touch it in fear of breaking anything. Does anyone have an example Docker compose file for a full fleet setup which would include a default agent which could be used as a sensible template?

b) Assuming no full compose file exists for a) any idea which setting (present or missing) is responsible for the agent duplication issue from the screenshot above?

Did you setup persistent volumes for both the fleet-server and the agent?

I do not run them in docker, but if I'm not wrong they need to have a persistent volume or else it will be a new installation every time you run the docker-compose.

Hmm, no. I've only linked the certificate folder so it can be used by the config. I guess I can add that in. Any idea which directories need to be mapped for fleet server and agent for this to work?

Actually I'm not sure this is the right approach as I do not use Elastic Stack and Elastic Agents on containers.

But I saw that there is a specific documentation about running elastic agent on containers.

Not sure about the persistent volume as it is not mentioned.

Yeah, that's the thing - I've been over that documentation left and right and without something more specific to grab on to I'm back at square one. I can take a peak at the running containers and see if there's an easily identifiable data location of sorts I can mount to, perhaps, add that persistency.

Otherwise it seems a bit weird that a dockerized solution (based off of the documentation) would work this way, where it'll be a "fresh instance" on container re-start.

I've tried mapping the data directories from the fleet server and default agent to some docker volumes. Alas, when the stack is restarted the fleet elements become duplicated as initially described. :frowning: Any other ideas?

Unfortunately no, I do not user docker.

You will need to wait to see if someone from Elastic can answer your question.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Turns our this was indeed the case. And I was a bit of a newbie when it comes to docker for not realising how to find this one out sooner.

The solution was fairly simple. I had to bind the following folder via the compose file:

    volumes:
      - <fleet-server-folder-or-volume>:/usr/share/elastic-agent/state

In my case I'm using a dedicated drive for the Elastic Stack, but this could just as well be a Docker volume.

The same target folder (/usr/share/elastic-agent/state) is used by Elastic Agents which aren't acting as fleet servers.

While this falls under "How to use Docker" I figured this out by realising there's a docker diff <container> command, which lists the paths a container has modified vs its base image. I'm leaving this because this method can be used to find out how any container attempts to store data.