RetryOnPrimaryException in ES node


(Dinesh) #1

Hi, we are using ES 2.0 on a 16 node cluster. There are 2 indexes. One index has 5 shards with RF 2. This is giving RetryOnPrimaryException while trying to index a document. This error is happening on only one of the nodes and writes are getting timed out (timeout value = 1min). ES process is not crashing on the nodes though.
We create index only once and never delete it. The other index is not having any issues.
Cluster health is green and all the shards are in STARTED state. What does this error mean? Shall I increase timeout value?
Please let me know what information can I look into or provide here to debug this.

Thanks.

[2016-10-12 04:22:07,867][INFO ][rest.suppressed ] objindex/obj/BFJlY1U Params: {index=objindex, id=BFJlY1U, type=obj, timeout=60s}
[objindex][[objindex][4]] RetryOnPrimaryException[Dynamics mappings are not available on the node that holds the primary yet]
at org.elasticsearch.action.support.replication.TransportReplicationAction.executeIndexRequestOnPrimary(TransportReplicationAction.java:1069)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:170)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:579)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-10-12 04:22:08,331][INFO ][rest.suppressed ] objindex/obj/BFNOd0M Params: {index=objindex, id=BFNOd0M, type=obj, timeout=60s}
[objindex][[objindex][4]] RetryOnPrimaryException[Dynamics mappings are not available on the node that holds the primary yet]
at org.elasticsearch.action.support.replication.TransportReplicationAction.executeIndexRequestOnPrimary(TransportReplicationAction.java:1069)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:170)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:579)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


(Yannick Welsch) #2

It sounds like there is a problem updating the cluster state on the node holding that shard. Can you share the output of running the following command both on master node and on the node throwing the exception:

curl -XGET 'http://localhost:9200/_cluster/state?local&pretty'

Probably the easiest way to fix this is to just restart the node.


(Dinesh) #3

Hi Yannick, thanks for the response. The output is around 20k characters and upload allows only image files. Can you please tell me how to provide the output of this command here. Thanks !


(Yannick Welsch) #4

Please use http://pastebin.com or a similar (free) service. If the cluster state contains private / confidential information, you can also make the paste exposure "unlisted" and send me the link in a private message here.


(system) #5