What is timeout to write to replica shard

What is the timeout for writing to replica shard ? If it is taking long time to write to replica can i specify value to timeout write on replica shard.

I am reading below link.

https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-write.html

Valid write consistency values : one, quorum , and all

Because of below exception index is turning to yellow and shard reallocation is happening.

2019-02-20T13:10:11,644][WARN ][o.e.c.a.s.ShardStateAction] [SEIGWPP05ES03] [sample_index][3] received shard failed for shard id [[sample_index][3]], allocation id [OKXJ2CKNR8CKilqJZpVe4Q], primary term [1], message [failed to perform indices:data/write/bulk[s] on replica [sample_index][3], node[IiIb5vQNSj6GAxyv2LbWBQ], [R], s[STARTED], a[id=OKXJ2CKNR8CKilqJZpVe4Q]], failure [NodeNotConnectedException[[samplenode][172.18.72.120:9300] Node not connected]]

No, there is no timeout. The primary waits indefinitely for the replica to respond (or for the connection to be closed).

The consistency parameter mentioned in your link no longer exists.

1 Like

@DavidTurner I really appreciate your reply.

We have 5 node 6.1 elastic cluster with index having 10 primary and 2 replica. Does it mean it has to write it to 2 shards before it sends ack?

How can I avoid index turning to yellow? or have it to disregard check on replica copy irrespective of it is written to replica or not.

failed to perform indices:data/write/bulk [s] on replica.

We are frequently getting above exception.

Connection timeout is it elastic parameter?

What is default ack to the application? Quorum ?

I checked in master document

https://www.elastic.co/guide/en/elasticsearch/guide/master/distrib-write.html

No, each write goes to the primary and both replicas before acknowledgement, so that's 3 shard copies in total.

Elasticsearch guarantees to write every document to every in-sync shard copy before responding. This means that if it can't write to a shard copy it must mark that copy as out-of-sync, which will mean it becomes unassigned and therefore that the index health reports as yellow.

There is no way to disregard this check. It's very important.

There will be more information in the logs telling you why. If you need help interpreting the logs then please share more information here - stack traces and other messages from around the same time are all important.

The guide to which you link was written about the 2.x series and is rather out of date. The reference manual has fresher information.

1 Like

Thanks @DavidTurner for valuable information.
Below is stack trace. I see node not connected exceptions.
I dont see any exception on disconnected node. should I increase node timeout value.

> [2019-03-26T02:16:55,992][WARN ][o.e.c.a.s.ShardStateAction] [sample_node_1] [sample_index][3] received shard failed for shard id [[sample_index][3]], allocation id [-HYXaiLbTRSUyFyRhTO6Dw], primary term [3], message [failed to perform indices:data/write/bulk[s] on replica [sample_index][3], node[IiIb5vQNSj6GAxyv2LbWBQ], [R], s[STARTED], a[id=-HYXaiLbTRSUyFyRhTO6Dw]], failure [NodeNotConnectedException[[sample_node_2][172.18.72.120:9300] Node not connected]]
org.elasticsearch.transport.NodeNotConnectedException: [sample_node_2][172.18.72.120:9300] Node not connected
        at org.elasticsearch.transport.TcpTransport.getConnection(TcpTransport.java:692) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TcpTransport.getConnection(TcpTransport.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:525) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:501) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction.sendReplicaRequest(TransportReplicationAction.java:1188) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1152) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplica(ReplicationOperation.java:171) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.performOnReplicas(ReplicationOperation.java:155) ~[elasticsearch-6.1.0.jar:6.1.0]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:122) ~[elasticsearch-6.1.0.jar:6.1.0]

By default Elasticsearch does log something when a node is disconnected, and continues to log failure messages every few minutes if it can't reconnect. You're not sharing very much in the way of logs so it's not very easy to help here. Can you share the last few minutes of logs leading up to this NodeNotConnectedException? Perhaps use https://gist.github.com.

I don't understand. What timeout are you asking about?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.