Version conflict while restarting cluster node

Hi,

i have an elasticsearch cluster with 4 nodes.

I have a datastream with the name "log_data", which have configured 2 shards and 1 replica.

I have a test script, which creates in an endless loop new documents to this datastream (about ~2000 documents in 30 seconds) with the following curl command:
curl -H "Content-Type: application/json" -X POST https://<random_node>/log_data/_doc -d '{ "message": "<RANDOM STRING>", "@timestamp": "<TIMESTAMP>"}'

While my script is indexing documents, i restart one elasticsearch node after the other (never 2 nodes at the same time!) to test the failover.

Not often, but sometimes my indexing script get's sometimes the following error:

{"error":{"root_cause":[{"type":"version_conflict_engine_exception","reason":"[IfKYF3kBIdh4-KSkheh_]: version conflict, document already exists (current version [1])","index_uuid":"CEXrhdvISd2uj0nkprC0gA","shard":"1","index":".ds-log_data-000001"}],"type":"version_conflict_engine_exception","reason":"[IfKYF3kBIdh4-KSkheh_]: version conflict, document already exists (current version [1])","index_uuid":"CEXrhdvISd2uj0nkprC0gA","shard":"1","index":".ds-log_data-000001"},"status":409}

At the same time, i get the error above, one elasticsearch node started the shutdown process (triggered with 'systemctl restart elasticsearch'). In the elasticsearch logs, i see the following messages:

[2021-04-28T07:00:25,871][INFO ][o.e.n.Node               ] [srv-a03.local] stopping ...
[2021-04-28T07:00:25,872][INFO ][c.a.o.s.a.r.AuditMessageRouter] [srv-a03.local] Closing AuditMessageRouter
[2021-04-28T07:00:25,872][INFO ][c.a.o.s.a.s.SinkProvider ] [srv-a03.local] Closing DebugSink
[2021-04-28T07:00:25,873][INFO ][o.e.x.w.WatcherService   ] [srv-a03.local] stopping watch service, reason [shutdown initiated]
[2021-04-28T07:00:25,874][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [srv-a03.local] [controller/3297] [Main.cc@154] ML controller exiting
[2021-04-28T07:00:25,874][INFO ][o.e.x.w.WatcherLifeCycleService] [srv-a03.local] watcher has stopped and shutdown
[2021-04-28T07:00:25,874][INFO ][o.e.x.m.p.NativeController] [srv-a03.local] Native controller process has stopped - no new native processes can be started
[2021-04-28T07:00:26,046][DEBUG][o.e.a.b.TransportShardBulkAction] [srv-a03.local] retrying action that failed in 46ms
org.elasticsearch.transport.NodeDisconnectedException: [srv-a03-a01.local][10.100.12.90:9300][indices:data/write/bulk[s][r]] disconnected
[2021-04-28T07:00:26,052][ERROR][i.n.u.c.D.rejectedExecution] [srv-a03.local] Failed to submit a listener notification task. Event loop shut down?

But the failed document from the error message is indexed into my cluster. I don't specified any document-id.

Why i am getting this error message, when the document is indexed successful?

I am confused

Thanks for help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.