Elasticserach bootstrap fail if elasticsearch.keystore.tmp already exists

Environment

  • Elasticsearch version: 7.4.2 (docker)

Problem Summary
If Elasticsearch already finds the file /usr/share/elasticsearch/config/elasticsearch.keystore.tmp during bootstrap checks, it fails to start

Steps to reproduce (PRODUCTION)
Start the Elasticsearch docker container, and stop it when it is creating elasticsearch.keystore (this is very difficult to do manually, but it's exacly what happened in our case)

Steps to reproduce (SIMULATION)
Create a custom docker-compose file where you map a volume that already contains an empty elasticsearch.keystore.tmp file mapped to /usr/share/elasticsearch/config/elasticsearch.keystore.tmp, like the following:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.2
    environment:
      - node.name=es01
      - cluster.name=es-sample-cluster
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms4g -Xmx4g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./elasticsearch.keystore.tmp:/usr/share/elasticsearch/config/elasticsearch.keystore.tmp
    ports:
      - 9200:9200

Expected Result
Elasticsearch starts correctly

Actual Result
Elasticsearch does not start, and fails with the following error:

elasticsearch_1  | Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.nio.file.FileAlreadyExistsException: /usr/share/elasticsearch/config/elasticsearch.keystore.tmp
elasticsearch_1  | Likely root cause: java.nio.file.FileAlreadyExistsException: /usr/share/elasticsearch/config/elasticsearch.keystore.tmp
elasticsearch_1  |      at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
elasticsearch_1  |      at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
elasticsearch_1  |      at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
elasticsearch_1  |      at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
elasticsearch_1  |      at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)
elasticsearch_1  |      at java.base/java.nio.file.Files.newOutputStream(Files.java:223)
elasticsearch_1  |      at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:410)
elasticsearch_1  |      at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:406)
elasticsearch_1  |      at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:254)
elasticsearch_1  |      at org.elasticsearch.common.settings.KeyStoreWrapper.save(KeyStoreWrapper.java:484)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Bootstrap.loadSecureSettings(Bootstrap.java:242)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:305)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150)
elasticsearch_1  |      at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
elasticsearch_1  |      at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
elasticsearch_1  |      at org.elasticsearch.cli.Command.main(Command.java:90)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115)
elasticsearch_1  |      at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
elasticsearch_1  | Refer to the log for complete error details.
das_elasticsearch_1 exited with code 1

Considerations
I narrowed down the problem to the following method: https://github.com/elastic/elasticsearch/blob/v7.4.2/server/src/main/java/org/elasticsearch/common/settings/KeyStoreWrapper.java#L478

The KeyStoreWrapper.save() method does not explicitly handle java.nio.file.FileAlreadyExistsException, so it simply fails and exits. I would expect that this exception would be handled explicitly, logged, and would allow the service to start all the same...

2 Likes

Bump... can anyone please advise on this?

I'd like to know if I can open a bug on Github for this, or if there's a specific reason why the elasticsearch.keystore bootstrap check fails if the temporary file is already present.

Is there any specific reason that you want/need to have an elasticsearch.keystore.tmp file in your mapped volume ? It would be a bug if elasticsearch leaves this lying there and you can't say with certainty that

this exception would be handled explicitly, logged, and would allow the service to start all the same...

as depending on why this temporary file was left there, it might or might not be the right decision to log and overwrite this.

I do not need to map elasticsearch.keystore.tmp in production, it's only a way to replicate the problem.

The situation I had in production was described in my previous post:

The problem happens if the docker container is stopped (not removed) exactly when KeyStoreWrapper is creating elasticsearch.keystore.tmp, but didn't rename it to elasticsearch.keystore. If this happens, when the container is restarted, it finds the elasticsearch.keystore.tmp in the folder, and crashes with an error. This condition is hard to replicate, because you have a window of a few milliseconds to stop the container exactly at the right moment.

The workaround, in my case, is just to stop and remove the container (i.e. docker-compose down) and then re-create (i.e. docker-compose up).

Nonetheless, in our product we would like to avoid this, and - if possible - have Elasticsearch to be robust against the elasticsearch.keystore.tmp file being already there. Is there any specific reason why the FileAlreadyExistsException does not need to be handled explicitly?

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.