As has been reported elsewhere here, we too were very surprised at how slow the Reindex API is if you simply follow the documentation, and issue a straight reindex.
For reference we have a 9M doc/4 TB index, simple clean schema but large documents, on a well provisioned 10-data-node [+masters/clients] 2.3.4 cluster under minimal load... and it is taking several days to reindex, peaking at at best 100 doc/s and averaging more like 25. This is with an identical schema and no source filtering or rewriting etc... just a simple pour-over from one shard count to a higher one.
Reading here I found the suggestion to perform manual parallelization by effectively 'sharding' the index according to some feature of the source. Presumably filtering against a unique key.
I'd like to request that in its stable form (5.x+) the Reindex API at minimum automatically perform concurrent scroll/bulk posts by shard. Presumably the same logic used to shard documents by id could also be used to partition into parallel read processes.
It would be invaluable to automatically make each shard reindex in parallel.
(The ideal of course would be that there's simply an option in the initiating operation, say
"concurrency": 10 ... It's funny that there are dials for throttling, but the common experience seems to be that an over-exuberant reindexing process is the least of anyone's worries!)
We can dream...!