Docker, ES 7.8.0, and the localhost snapshot repository

I'm trying to execute a Dockerfile from elasticsearch:7.8.0 that will:

  1. Add a backup.tar.gz file to the image and extract it to a backup location.
  2. Change ownership of that folder so that ES has permission to read it.
  3. Access the snapshot API from within that image using curl to load the data into ES.

1 and 2 are working fine, I can see the extracted files with the correct permission on the running container.

If I sh into the container, my curl commands run just fine.

If I add the same exact curl commands to the Dockerfile, I get an error when trying to do my normal docker-compose up command:

Service 'elasticsearch7' failed to build: The command '/bin/sh -c curl -X PUT "localhost:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d'{"type": "fs", "settings": {"location": "/usr/share/elasticsearch/backup"}}'' returned a non-zero code: 7

I've tried 127.0.0.1 instead of localhost as well as a myriad of other options. I can always get the command to work fine when SH'd into the container, but the same command doesn't work when starting the container up.

@Jasper_Showers
I am assuming this is a single node cluster. Instead of tar of snapshot repository, why not take tar the data directory?

You un-tar it inside ES data directory, set ownership/permissions and start ES.

I think the reason your build is failing is after step 2 ES won't be running. It will start only after all other steps are completed. You "docker exec" after container is up (ES is running).

Yep, a single node cluster.

Unfortunately I do not control the export process and received the file as a snapshot export.

I think the reason your build is failing is after step 2 ES won't be running. It will start only after all other steps are completed. You "docker exec" after container is up (ES is running).

I believe you are right. Is there a way to delay the script or something until after ES startup or what is the "best way" to do an automated import like this?

Thanks!

You can temporarily start ES, import and then stop. But it will bloat your image as it will contain duplicate data or lots of layers.

If I had to do it I will write a separate script to convert snapshot into data directory as follows (commands are not tested)

   mkdir repo data
   cd repo && tar -zxf backup.tar.gz && cd ..
   docker run --rm -d --name temp_es \
              -v $(pwd)/data:/data -v$(pwd)/repo:/repo \
              -e "path.repo=/repo" -e "path.data=/data" \
              -p 9200:9200 \
             docker.elastic.co/elasticsearch/elasticsearch:7.8.0

   sleep 60 # or run curl localhost:9200 in a loop till you get good response.

  # all commands runs on the build machine.
 # you may have to adjust /repo to /repo/X if untar creates another directory.
  curl -X PUT "localhost:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d'{"type": "fs", "settings": {"location": "/repo"}}'
  
  # TODO: import snapshot with wait_for_completion flag
  
  # forcemerge to in case snapshot was taken without it. This will reduce index size and hence image size.
  curl -s "http://localhost:9200/_forcemerge?max_num_segments=1"

  # Optimized data will be written to $(pwd)/data directory on the build machine.
  
  docker kill  temp_es

Now you can build your main image and copy data directory on the build machine to data directory of your ES and set ownership to elasticsearch user

Thank you for your help btw! I actually had a very "oh duh" moment regarding your first comment:

I figured that I'm able to make the _snapshot API calls manually, so I just did that to import the data once. Then I compressed THAT data directory and made a container that extracted that data into data/nodes.

EZPZ. The answer was staring me right in the face.

I will play around with that script a bit as well, I'm curious if I can make something like that work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.