index.translog.durability use the default value, which is request
After several tests, it was found that the lost data was in the translog of node 138(from step 5 backup), but not on node137. after a restart and recovery, this part of data was lost.
When node137 is shutdown, some shards don't have replica, so the lost data are only in the translog on node138, after the restart, why not restore the translog data on node138?
Why? Will data be lost after the es restart and recover?
As already pointed out, you are running a very very old version of Elasticsearch. A lot of work has gone into improving resiliency since then so I would recommend upgrading to the latest version.
I have not used this version in many years, so any comments may very well be wrong. The issue here I think is that you are first shutting down one node and allowing this to fall behind. If you then brought this back while the other nodes are still running, I would expect you to not see any data loss. The fact that you are shutting down all nodes before restarting all at the same time means that you do not know which node will be elected as master on startup nor which shards will be selected as primaries when recovered. If you instead of restarting all nodes at the same time first restarted the nodes that were active last and then, once a mster has been elected, added the first node to go down I suspect the result would be different.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.