How can I use elasticdump to reindex an entire cluster?

PhaedrusTheGreek · January 26, 2016, 8:25pm

I love elasticdump! How can I use it to reindex my entire cluster?

Also, if one giant file is too much, is there some script that I can run to split it out into multiple files?

PhaedrusTheGreek · January 26, 2016, 8:26pm

This will dump the whole cluster and reindex it.

Also, this does not retain mappings, so you'll want to make sure that your dynamic mappings (with the new index settings) are in place prior to the reindex.

Plus, realize that dumping the entire cluster into a single file is quick and dirty, and god knows what could go wrong.

$ yum install npm
$ npm install elasticdump
$ node_modules/elasticdump/bin/elasticdump \
    --input=http://<es_ip>:9200/ \
    --output=./whole_cluster.json --all=true
$ node_modules/elasticdump/bin/elasticdump \
    --output=http://<es_ip>:9200/ \
    --input=./whole_cluster.json --bulk=true

A safer way might be to dump into multiple files. You could use a script like this:

#!/bin/bash
# Usage:
#  ./reindex.sh dump
#  ./reindex.sh restore
ES=http://localhost:9200/
ED=./node_modules/elasticdump/bin/elasticdump

dump() {
for index in `/usr/bin/curl -s -XGET $ES/_cat/indices?h=i `
do
        echo $index
        $ED --input=$ES/$index --output=$index.json
done
}

restore() {
FILES=*.json
for f in $FILES
do
        echo "Processing $f ..."
        $ED --bulk=true --input=$f --output=$ES
done
}

$1; exit $?

Be warned that neither of the above methods is necessarily an efficient or safe way to reindex a cluster. Other tools such as stream2es can be parallelized and are better suited to heavy workloads.

Also see the elasticdump docs.

warkolm · January 26, 2016, 10:07pm

I use Logstash for this - https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06

varunmehta · January 27, 2016, 4:45pm

Finally came up with this script using elasticdump.

gist.github.com

https://gist.github.com/varunmehta/d0553071dad4171e4dd7

reindex.sh

#!/bin/bash

# Help menu for the script.
usage () {
	echo "Usage: `basename $0` [-h] [-b] [-d] [-r] [-i] [-s] [http://es-ESname:9200]"
	echo ""
	echo "where:   "
	echo "      -h   Show this help text "
	echo "      -b   Backup the elasticsearch indices to .json files "
	echo "      -d   Delete the indices backed up"

This file has been truncated. show original