What is timeout to write to replica shard

oraclept · March 26, 2019, 3:59pm

What is the timeout for writing to replica shard ? If it is taking long time to write to replica can i specify value to timeout write on replica shard.

I am reading below link.

https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-write.html

Valid write consistency values : one, quorum , and all

Because of below exception index is turning to yellow and shard reallocation is happening.

2019-02-20T13:10:11,644][WARN ][o.e.c.a.s.ShardStateAction] [SEIGWPP05ES03] [sample_index][3] received shard failed for shard id [[sample_index][3]], allocation id [OKXJ2CKNR8CKilqJZpVe4Q], primary term [1], message [failed to perform indices:data/write/bulk[s] on replica [sample_index][3], node[IiIb5vQNSj6GAxyv2LbWBQ], [R], s[STARTED], a[id=OKXJ2CKNR8CKilqJZpVe4Q]], failure [NodeNotConnectedException[[samplenode][172.18.72.120:9300] Node not connected]]

DavidTurner · March 26, 2019, 4:57pm

No, there is no timeout. The primary waits indefinitely for the replica to respond (or for the connection to be closed).

The consistency parameter mentioned in your link no longer exists.

oraclept · March 26, 2019, 5:52pm

@DavidTurner I really appreciate your reply.

We have 5 node 6.1 elastic cluster with index having 10 primary and 2 replica. Does it mean it has to write it to 2 shards before it sends ack?

How can I avoid index turning to yellow? or have it to disregard check on replica copy irrespective of it is written to replica or not.

failed to perform indices:data/write/bulk [s] on replica.

We are frequently getting above exception.

Connection timeout is it elastic parameter?

What is default ack to the application? Quorum ?

I checked in master document

https://www.elastic.co/guide/en/elasticsearch/guide/master/distrib-write.html

DavidTurner · March 26, 2019, 6:27pm

No, each write goes to the primary and both replicas before acknowledgement, so that's 3 shard copies in total.

Elasticsearch guarantees to write every document to every in-sync shard copy before responding. This means that if it can't write to a shard copy it must mark that copy as out-of-sync, which will mean it becomes unassigned and therefore that the index health reports as yellow.

There is no way to disregard this check. It's very important.

There will be more information in the logs telling you why. If you need help interpreting the logs then please share more information here - stack traces and other messages from around the same time are all important.

The guide to which you link was written about the 2.x series and is rather out of date. The reference manual has fresher information.

oraclept · March 26, 2019, 7:54pm

Thanks @DavidTurner for valuable information.
Below is stack trace. I see node not connected exceptions.
I dont see any exception on disconnected node. should I increase node timeout value.

> [2019-03-26T02:16:55,992][WARN ][o.e.c.a.s.ShardStateAction] [sample_node_1] [sample_index][3] received shard failed for shard id [[sample_index][3]], allocation id [-HYXaiLbTRSUyFyRhTO6Dw], primary term [3], message [failed to perform indices:data/write/bulk[s] on replica [sample_index][3], node[IiIb5vQNSj6GAxyv2LbWBQ], [R], s[STARTED], a[id=-HYXaiLbTRSUyFyRhTO6Dw]], failure [NodeNotConnectedException[[sample_node_2][172.18.72.120:9300] Node not connected]]

org.elasticsearch.transport.NodeNotConnectedException: [sample_node_2][172.18.72.120:9300] Node not connected
        at org.elasticsearch.transport.TcpTransport.getConnection(TcpTransport.java:692) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TcpTransport.getConnection(TcpTransport.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:525) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:501) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.sendReplicaRequest(TransportReplicationAction.java:1188) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1152) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:171) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:155) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]

DavidTurner · March 27, 2019, 8:26am

By default Elasticsearch does log something when a node is disconnected, and continues to log failure messages every few minutes if it can't reconnect. You're not sharing very much in the way of logs so it's not very easy to help here. Can you share the last few minutes of logs leading up to this NodeNotConnectedException? Perhaps use https://gist.github.com.

I don't understand. What timeout are you asking about?

system · April 24, 2019, 8:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inconsistent state of replica-primary shards after node disconnection Elasticsearch	3	768	July 6, 2017
Could we support timeout mechanism for replica bulk request？ Elasticsearch	7	272	November 23, 2022
Ha Elasticsearch	8	453	July 6, 2017
Question on replica shards Elasticsearch	1	316	February 18, 2022
Cluster issue -> raiseTimeoutFailure Elasticsearch	2	411	July 6, 2017

What is timeout to write to replica shard

Related topics