ES5 RestClient retry behavior with multiple nodes: Fix suggestion

trisport88 · November 3, 2016, 6:05pm

Hi,

I have been testing the behavior of the RestClient (ES 5) when a node that is part of a two node cluster is not running. I have configured the RestClient with two nodes:

localhost:9200
localhost:9321

When I construct the RestClient the node 9200 is not running:

restClient = RestClient.builder(hosts)
.setFailureListener(failureListener)
.build();

I then run:

restClient.performRequest("DELETE", "/my_index", new Hashtable<>());

The following exception is thrown even though the index is deleted:

java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at org.elasticsearch.client.RestClient.addSuppressedException(RestClient.java:470)
at org.elasticsearch.client.RestClient.access$700(RestClient.java:84)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.trackFailure(RestClient.java:567)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:559)
at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:352)
at org.elasticsearch.client.RestClient$1.failed(RestClient.java:331)
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:179)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
... 1 more
[CIRCULAR REFERENCE:java.net.ConnectException: Connection refused]

I traced through the code to see at what point the index was deleted and as far as I can tell the sequence is as follows:

Try deleting the index on 9200.
A call to public void failed(Exception failure) indicates that 9200 failed.
The request is retried on 9321.
A call to public void failed(Exception failure) indicates that 9321 failed.

Before the retry the index still exists, but after the retry it does not.

curl -I 'http://localhost:9321/my_index'
HTTP/1.1 200 OK
content-type: text/plain; charset=UTF-8
content-length: 0

curl -I 'http://localhost:9321/my_index'
HTTP/1.1 404 Not Found
content-type: text/plain; charset=UTF-8
content-length: 0

Even though the index was successfully deleted the method is still returning an exception.

Am I using the rest client incorrectly?

Cheers,

Stuart

trisport88 · November 3, 2016, 6:34pm

Hi,

Not sure if it makes a difference but my failure difference is defined as follows:

/**
 * HttpHost failure listener for the ElasticSearch client API
 */
private class MyRestClientFailureListener extends RestClient.FailureListener
{

    /**
     * Notifies that the host provided as argument has just failed
     */
    public void onFailure(HttpHost host)
    {
        //log failure
    }
}

trisport88 · November 5, 2016, 5:05am

Has anyone tried the ES5 RestClient against multiple nodes with the 'first' one shutdown? Does it work for you?

trisport88 · November 6, 2016, 2:12am

OK, so I have hacked around with the RestClient code and by adding one line it seems to behave as I would expect in an failover situation:

        private void retryIfPossible(Exception exception, Iterator<HttpHost> hosts, HttpRequestBase request) {
            if (hosts.hasNext()) {
                //in case we are retrying, check whether maxRetryTimeout has been reached
                long timeElapsedMillis = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime);
                long timeout = maxRetryTimeoutMillis - timeElapsedMillis;
                if (timeout <= 0) {
                    IOException retryTimeoutException = new IOException(
                            "request retries exceeded max retry timeout [" + maxRetryTimeoutMillis + "]");
                    listener.onDefinitiveFailure(retryTimeoutException);
                } else {
                    listener.trackFailure(exception);
                    request.reset();
                    HttpAsyncResponseConsumer<HttpResponse> newResponseConsumer = new HeapBufferedAsyncResponseConsumer();
                    performRequestAsync(startTime, hosts, request, newResponseConsumer, listener);
                }
            } else {
                listener.onDefinitiveFailure(exception);
            }
        }

I added the line creating a newResponseConsumer just before calling performRequestAsync. Is this a valid fix? My theory was that passing the old response consumer a second time did not work because it 'remembered' that an exception had already been thrown.

If this is a valid fix what is the best way to get this released?

Cheers,

Stuart

javanna · November 7, 2016, 10:06am

Hi Stuart,
I can confirm this is a bug.

Your potential fix showed where the problem is, but unfortunately it is not enough as we allow users to provide their own response consumer, which as you found out cannot be reused across multiple retries.

Working on a fix, also will need to find out why our tests didn't catch this... that is quite disappointing. Will fix all of this asap.

trisport88 · November 7, 2016, 3:23pm

Hi Luca,

Thanks very much for confirming that this is a bug and that you are working on a fix (My backup plan was to write my own HA client and I was not looking forward to doing this). I will keep an eye out for when you release this.

Again thanks for letting me know as I can now concentrate on other things.

Cheers,

Stuart

javanna · November 8, 2016, 4:05pm

A fix for this bug has just been merged, unfortunately it is too late for 5.0.1 but it will come with 5.0.2 and 5.1 .

Cheers
Luca

trisport88 · November 8, 2016, 6:27pm

Fantastic response time, thank you.

javanna · November 15, 2016, 11:18pm

Hi,
I have good news, the fix was released today with 5.0.1

trisport88 · November 16, 2016, 3:56am

Topic		Replies	Views
ES5: Correct RestClient.FailureListener behaviour & retry logic Elasticsearch	3	2103	January 3, 2017
HighLevel REST Client : Retry requests on failure Elasticsearch	2	6065	February 21, 2019
NodeJS client: retries Elasticsearch	1	1211	July 5, 2017
Connecting ES using RestHighLevelClient Elasticsearch	3	533	June 29, 2018
Retry for java High level rest client. ES version 6.1 Elasticsearch	10	6023	April 30, 2019

ES5 RestClient retry behavior with multiple nodes: Fix suggestion

Related topics