Created a Gist documenting my approach to reindexing using logstash. Hopes it helps
Reindexing your Elasticsearch indice with limited resource can be a painw when you have limited resources and need it running at the same time. Hence it is advisable to size up the quantity and break it down into chunks based on time. Look to Kibana. The break down is already done for you even as you perform your search. Just pop up the request and the aggregation query is there. Using this, you can tally your document count according to time to verify your activities.
I need to do this as due to resource constrains. Logstash input plugin sometimes hit into error and the plugin restart. When it restarts the query get executed again. With logstash plugin-input-Elasticsearch, it resume a new search. Any previous scroll ID is discarded. This is something you do not want happening. You can end up with more document in the target than the source. (.i.e Corruption). Thus breaking it down to chucks limit the corruption and makes remediation easier. This automates the process of executing the logstash config one after another. Otherwise, manually is going be costly in terms of time.
So the strategy is like this:
1)create a logstash config template with {START} and {END} tag which we will replace using SED command with the actual time value.
2)create a input.data file that will have 2 value per line, START and END EPOCH time.
3)The script will loop through the input and create the actual logstash config file and execute it.
It is my experience that with approx 1GB of memory, you should be performing approx 30-50K document in one iteration.
Dependency: Logstash (prefereably in path), Cygwin(For windows), sed.
Assumption: everything is happening in the current directory.
Lastly using a diff tool to compare the source and target aggregation result to verify the process.