Docker, ES 7.8.0, and the localhost snapshot repository

Jasper_Showers · July 10, 2020, 12:46am

I'm trying to execute a Dockerfile from elasticsearch:7.8.0 that will:

Add a backup.tar.gz file to the image and extract it to a backup location.
Change ownership of that folder so that ES has permission to read it.
Access the snapshot API from within that image using curl to load the data into ES.

1 and 2 are working fine, I can see the extracted files with the correct permission on the running container.

If I sh into the container, my curl commands run just fine.

If I add the same exact curl commands to the Dockerfile, I get an error when trying to do my normal docker-compose up command:

Service 'elasticsearch7' failed to build: The command '/bin/sh -c curl -X PUT "localhost:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d'{"type": "fs", "settings": {"location": "/usr/share/elasticsearch/backup"}}'' returned a non-zero code: 7

I've tried 127.0.0.1 instead of localhost as well as a myriad of other options. I can always get the command to work fine when SH'd into the container, but the same command doesn't work when starting the container up.

Vinayak_Sapre · July 10, 2020, 7:30am

@Jasper_Showers
I am assuming this is a single node cluster. Instead of tar of snapshot repository, why not take tar the data directory?

You un-tar it inside ES data directory, set ownership/permissions and start ES.

I think the reason your build is failing is after step 2 ES won't be running. It will start only after all other steps are completed. You "docker exec" after container is up (ES is running).

Jasper_Showers · July 11, 2020, 1:40am

Yep, a single node cluster.

Unfortunately I do not control the export process and received the file as a snapshot export.

I think the reason your build is failing is after step 2 ES won't be running. It will start only after all other steps are completed. You "docker exec" after container is up (ES is running).

I believe you are right. Is there a way to delay the script or something until after ES startup or what is the "best way" to do an automated import like this?

Thanks!

Vinayak_Sapre · July 11, 2020, 3:12am

You can temporarily start ES, import and then stop. But it will bloat your image as it will contain duplicate data or lots of layers.

If I had to do it I will write a separate script to convert snapshot into data directory as follows (commands are not tested)

   mkdir repo data
   cd repo && tar -zxf backup.tar.gz && cd ..
   docker run --rm -d --name temp_es \
              -v $(pwd)/data:/data -v$(pwd)/repo:/repo \
              -e "path.repo=/repo" -e "path.data=/data" \
              -p 9200:9200 \
             docker.elastic.co/elasticsearch/elasticsearch:7.8.0

   sleep 60 # or run curl localhost:9200 in a loop till you get good response.

  # all commands runs on the build machine.
 # you may have to adjust /repo to /repo/X if untar creates another directory.
  curl -X PUT "localhost:9200/_snapshot/my_backup?pretty" -H 'Content-Type: application/json' -d'{"type": "fs", "settings": {"location": "/repo"}}'
  
  # TODO: import snapshot with wait_for_completion flag
  
  # forcemerge to in case snapshot was taken without it. This will reduce index size and hence image size.
  curl -s "http://localhost:9200/_forcemerge?max_num_segments=1"

  # Optimized data will be written to $(pwd)/data directory on the build machine.
  
  docker kill  temp_es

Now you can build your main image and copy data directory on the build machine to data directory of your ES and set ownership to elasticsearch user

Jasper_Showers · July 11, 2020, 3:47am

Thank you for your help btw! I actually had a very "oh duh" moment regarding your first comment:

I figured that I'm able to make the _snapshot API calls manually, so I just did that to import the data once. Then I compressed THAT data directory and made a container that extracted that data into data/nodes.

EZPZ. The answer was staring me right in the face.

Jasper_Showers · July 11, 2020, 3:49am

I will play around with that script a bit as well, I'm curious if I can make something like that work.

system · August 8, 2020, 3:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Having trouble creating Elasticsearch Snapshot Repository Elasticsearch docker , snapshot-and-restore	10	791	September 15, 2022
Create snapshot API not working Elasticsearch docker , snapshot-and-restore	3	320	July 28, 2023
Snapshot Backup while using Docker Elasticsearch	2	4571	July 28, 2017
Getting error to run the snapshot script to back up indexes Elasticsearch	1	447	December 19, 2018
External Docker Volume for Elasticsearch Snapshot Elasticsearch	1	3081	August 19, 2019

Docker, ES 7.8.0, and the localhost snapshot repository

Related topics