We have around 80 es clusters in our production environment and with terabytes of data .
Currently we are running with elasticsearch-1.4.1-SNAPSHOT version and now we are planning to update it with 2.1.1 version.
In 2.x version the Multiple path.data striping , In our es configuration we used multiple data directories like this
path.data: ["/mnt1/data/es","/mnt2/data/es"]
Dual feeding is something you'd have to implement application side.
I don't know if there is something you can do to speed up data.path work.
It might be nice to get a sense how much is data.path and how much is the
replicas drifting from the primary's data. 1.4.1 doesn't have synced flush
so even if you aren't writing data the new processes can't know that
the primary and the replica have the same data if their files are
different.
Another old school trick is to build a new replica that replaces the old
replica but has its bits copied from the primary before the restart. It probably will shorten the time to yellow after the restart at the cost of
the same amount of time before the restart. You do it using the API to move
shards: just move the replicas off of the nodes they are on. After 1.7 this
kind of trickery should have no place due to synced flush. It won't do
anything for the data.path shuffling but that might not be all that long.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.