How can I use elasticdump to reindex an entire cluster?


(Jay Greenberg) #1

I love elasticdump! How can I use it to reindex my entire cluster?

Also, if one giant file is too much, is there some script that I can run to split it out into multiple files?


(Jay Greenberg) #2

This will dump the whole cluster and reindex it.

Also, this does not retain mappings, so you'll want to make sure that your dynamic mappings (with the new index settings) are in place prior to the reindex.

Plus, realize that dumping the entire cluster into a single file is quick and dirty, and god knows what could go wrong.

$ yum install npm
$ npm install elasticdump
$ node_modules/elasticdump/bin/elasticdump \
    --input=http://<es_ip>:9200/ \
    --output=./whole_cluster.json --all=true
$ node_modules/elasticdump/bin/elasticdump \
    --output=http://<es_ip>:9200/ \
    --input=./whole_cluster.json --bulk=true

A safer way might be to dump into multiple files. You could use a script like this:

#!/bin/bash
# Usage:
#  ./reindex.sh dump
#  ./reindex.sh restore
ES=http://localhost:9200/
ED=./node_modules/elasticdump/bin/elasticdump

dump() {
for index in `/usr/bin/curl -s -XGET $ES/_cat/indices?h=i `
do
        echo $index
        $ED --input=$ES/$index --output=$index.json
done
}

restore() {
FILES=*.json
for f in $FILES
do
        echo "Processing $f ..."
        $ED --bulk=true --input=$f --output=$ES
done
}

$1; exit $?

Be warned that neither of the above methods is necessarily an efficient or safe way to reindex a cluster. Other tools such as stream2es can be parallelized and are better suited to heavy workloads.

Also see the elasticdump docs.


(Mark Walkom) #3

I use Logstash for this - https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06 :slight_smile:


(Varun Mehta) #4

Finally came up with this script using elasticdump.


(system) #5