No_shard_available_action_exception in integration tests

Hi,

I'm the author of JobRunr, a distributed job scheduling library that also supports ElasticSearch as a storage provider.

For my integration tests, I use TestContainers and start an ElasticSearch.

@Container
    private static final ElasticsearchContainer elasticSearchContainer = new ElasticsearchContainer("docker.elastic.co/elasticsearch/elasticsearch:7.10.1")
            .withNetwork(network)
            .withNetworkAliases("elasticsearch")
            .withExposedPorts(9200);

There is one case where I always get a no_shard_available_action_exception even though the cluster health is yellow.

My code is as follows:

@Override
    protected boolean isNewMigration(NoSqlMigration noSqlMigration) {
        try {
            System.out.println("Testing for new migration...");
            waitForHealthyCluster(client);
            GetResponse migration = client.get(new GetRequest(JOBRUNR_MIGRATIONS_INDEX_NAME, substringBefore(noSqlMigration.getClassName(), "_")), RequestOptions.DEFAULT);
            return !migration.isExists();
        } catch (IOException e) {
            throw new StorageException(e);
        }
    }

And the logs:

========================================================================
Cluster health:YELLOW
========================================================================
Testing for new migration...
========================================================================
Cluster health:YELLOW
========================================================================
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
Exception in thread "main" ElasticsearchStatusException[Elasticsearch exception [type=no_shard_available_action_exception, reason=No shard available for [get [jobrunr_migrations][_doc][M001]: routing [null]]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_index_shard_state_exception, reason=CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]]];

How can I be sure that ES is ready to receive GetRequests?

IIRC there's an issue on that.

What I'm doing on my side is to wait if some issues like this are happening and retry until I reach a timeout.

You can see that here:

This is not ideal for sure but at least I don't have non stable integration tests anymore...

A thing you could do, is to check the index status (waitForHealthyIndex(JOBRUNR_MIGRATIONS_INDEX_NAME)) instead of the cluster status. But I guess you might hit the same issue though.

1 Like

An index reports yellow health if it's newly created, because we don't want clusters to indicate they're unhealthy just because a new index was created. I'm guessing it's that. Most of the Elasticsearch integration tests wait for a newly-created index to be green before proceeding.

Hi,

Thank you both for the answers.

Will the cluster become green if there is only 1 node participating?

Yes, as long as you set number_of_replicas: 0 on any indices you create.