How to recover in Java API due to CurrentState[STARTED] shard is not a primary

I am using Java API to push BulkRequests to Elasticsearch. For testing availability, I killed a data node and got the following DEBUG message in Elasticsearch Data Node log:

[myindex][[myindex][1]] IllegalIndexShardStateException[CurrentState[STARTED] shard is not a primary]
at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:557)
at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:212)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:119)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

However, I did not get any exception in the Java API. How do I recover from such errors?

If you look at the BulkResponse object, you probably got back a lot of individual errors, no?

also, for some of these errors, we do wait until shard is in the right state and do a retry, which means that the document could still have been successfully indexed. As @dadoonet said, you have to check the BulkResponse object for failures (e.g. using method hasFailures()).

I do check for errors by iterating over the array of BulkItemResponse for all responses to detect errors such as MapperParsingExceptions. But, I haven't found any errors that correspond to the one I am seeing. Am I doing something wrong?

As I've said, all is good then. The log message (which is at DEBUG level) just indicates that an index request which was part of the bulk request could not be written right away to the shard (as it wasn't promoted to primary yet). The indexing request was, however, internally retried shortly after and succeeded before the BulkResponse was returned.

1 Like