Java RestHighLevelClient gives different exception responses on a repeated request

Hi everyone,

I am hitting this "strange" behavior related to search_phase_execution_exceptions that I don't really understand and would like to know the underlying reasons for.

A bit of context; I'm running ES 7.8.1, the cluster has two ES nodes. I'm running it on my dev workstation (windows 10) with AdoptOpenJDK 11

The goal: To provide end users with the best possible "reason" for why a given search request failed (basically handling "all shards failed" scenarios)

I have defined a query string which I know will cause a lexical error:

    Test """

Going native json, this is the query, sent to test_index/_search endpoint:

    {
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "Test \"\"\""
          }
        }
      ]
    }
  }
}

And this is the response, 10/10 times:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "Failed to parse query [Test \"\"\"]",
        "index_uuid": "8XXlJMXlRHyZkWFF3ilzfw",
        "index": "test_index"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test_index",
        "node": "S1yQggTBSVW7hsZrg02rWA",
        "reason": {
          "type": "query_shard_exception",
          "reason": "Failed to parse query [Test \"\"\"]",
          "index_uuid": "8XXlJMXlRHyZkWFF3ilzfw",
          "index": "test_index",
          "caused_by": {
            "type": "parse_exception",
            "reason": "Cannot parse 'Test \"\"\"': Lexical error at line 1, column 9.  Encountered: <EOF> after : \"\"",
            "caused_by": {
              "type": "token_mgr_error",
              "reason": "Lexical error at line 1, column 9.  Encountered: <EOF> after : \"\""
            }
          }
        }
      }
    ]
  },
  "status": 400
}

However, if I make use of the RestHighLevelClient and the searchAsync() method, I experience a different behavior:
Repeating the same request - the exception "object" I get back varies, sometimes it holds a ResponseException part of the Suppressed array - sometimes the Suppressed array is empty.

A code snippet from the onFailure handler, note the logging of the esEx here:

public void onFailure(Exception e) {
            	
            	if(e instanceof ElasticsearchStatusException) {
        			ElasticsearchStatusException esEx = (ElasticsearchStatusException) e;
        			
        			Throwable rootCause = esEx.getRootCause();
        			Throwable cause = esEx.getCause();
        			Throwable[] suppressed = rootCause.getSuppressed();
        			
        			LOG.info("Cause: " + cause);
        			LOG.info("rootCause: " + rootCause);
        			LOG.info("Suppressed size: " + suppressed.length);
        			LOG.info("Suppressed: " + (suppressed.length > 0 ? suppressed[suppressed.length-1] : null));
            	}
            	
            	
                LOG.error("Exception caught from " + Stream.of(stackTrace).map(StackTraceElement::toString).collect(Collectors.joining("\n")));
                if(sr != null) {
                    LOG.error("ES request = {}", sr.source());
                }
                cf.completeExceptionally(exceptionTransformer.apply(e));
            }

Most of the times (approx 60-70%) I end up with this printed from my log statements:

c.k.rest.async.CompletableFutureUtils    : Cause: ElasticsearchException[Elasticsearch exception [type=token_mgr_error, reason=token_mgr_error: Lexical error at line 1, column 6.  Encountered:     <EOF> after : ""]] 
c.k.rest.async.CompletableFutureUtils    : rootCause: ElasticsearchException[Elasticsearch exception [type=token_mgr_error, reason=token_mgr_error: Lexical error at line 1, column 6.  Encountered: <EOF> after : ""]]
c.k.rest.async.CompletableFutureUtils    : Suppressed size: 0
c.k.rest.async.CompletableFutureUtils    : Suppressed: null

However, the other 30-40% of the times I issue this request, this is printed (see the ResponseException part of Suppressed array):

c.k.rest.async.CompletableFutureUtils    : Cause: null
c.k.rest.async.CompletableFutureUtils    : rootCause: ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]
c.k.rest.async.CompletableFutureUtils    : Suppressed size: 1
c.k.rest.async.CompletableFutureUtils    : Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9202], URI [/test_index/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=kadmin&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"Failed to parse query [1 \"\"\"]","index_uuid":"dAeiuu_tQ_SBKP1UAsnfVw","index":"test_index"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_index","node":"VBx7kpXwTaCtoq0cP_m_oA","reason":{"type":"query_shard_exception","reason":"Failed to parse query [1 \"\"\"]","index_uuid":"dAeiuu_tQ_SBKP1UAsnfVw","index":"test_index","caused_by":{"type":"parse_exception","reason":"Cannot parse '1 \"\"\"': Lexical error at line 1, column 6.  Encountered: <EOF> after : \"\"","caused_by":{"type":"token_mgr_error","reason":"Lexical error at line 1, column 6.  Encountered: <EOF> after : \"\""}}}}]},"status":400}

So to sum up;
Whenever the ResponseException is part of the exception returned, the cause is null but suppressed.length >= 1

If the ResponseException is not part of the exception returned, there is a cause object and suppressed.length == 0

Any clues to why I get this ambiguous behavior when the request is the same all the time? And can I do anything to get the ResponseException always?

I can add that regardless of which "flow" I end up in, the stacktrace is always coming from the same point in the ES code;

	at org.elasticsearch.client.RestHighLevelClient$1.onFailure(RestHighLevelClient.java:1783) ~[elasticsearch-rest-high-level-client-7.8.1.jar:7.8.1]
	at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:598) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:343) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]

Kind Regards

Eivind Rønnevik

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.