Data is lost after elasticsearch restart

The cluster(es 2.4.0) contains three nodes (node137,node138,node139) , create an index, 5 shards, and 1 replica.

I follow these steps to test:

  1. the cluster status is green.
  2. shutdown node137.
  3. create large amount of data.
  4. shutdown node138 and node139 immediately.
  5. copy folder of es data as backup.
  6. start node137, node138 and node139.
  7. query from cluster, some data is lose.

elasticsearch.yml

 http.port: 9200
 transport.tcp.port: 9300
 discovery.zen.ping.unicast.hosts: ["192.168.59.137:9300","192.168.59.138:9300","192.168.59.139:9300"]
 discovery.zen.minimum_master_nodes: 2
 gateway.recover_after_nodes: 2

index.translog.durability use the default value, which is request

After several tests, it was found that the lost data was in the translog of node 138(from step 5 backup), but not on node137. after a restart and recovery, this part of data was lost.

When node137 is shutdown, some shards don't have replica, so the lost data are only in the translog on node138, after the restart, why not restore the translog data on node138?
QQ截图20230328122530

Why? Will data be lost after the es restart and recover?

You need to provide a lot more of context about this issue and tests you made

First, what version are you using? What are the configuration of your nodes? Please share the elasticsearch.yml from your three nodes.

What tests you did? What do you have in the logs of those nodes?

Welcome to our community! :smiley:

This is positively ancient and you need to upgrade as a matter of serious urgency.

As already pointed out, you are running a very very old version of Elasticsearch. A lot of work has gone into improving resiliency since then so I would recommend upgrading to the latest version.

I have not used this version in many years, so any comments may very well be wrong. The issue here I think is that you are first shutting down one node and allowing this to fall behind. If you then brought this back while the other nodes are still running, I would expect you to not see any data loss. The fact that you are shutting down all nodes before restarting all at the same time means that you do not know which node will be elected as master on startup nor which shards will be selected as primaries when recovered. If you instead of restarting all nodes at the same time first restarted the nodes that were active last and then, once a mster has been elected, added the first node to go down I suspect the result would be different.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.