Elasticsearch search query returning "empty" response using elasticsearch python API

I am using elasticsearch py client in my real time application.

The application performs search query aggregations like "sum of field settledAmount in last 10 days".

Most of queries work without problems. However a tiny percent fail silently returning the following.

    {
       "took":0,
       "timed_out":false,
       "_shards":{
          "total":0,
          "successful":0,
          "skipped":0,
          "failed":0
       },
       "hits":{
          "total":{
             "value":0,
             "relation":"eq"
          },
          "max_score":0.0,
          "hits":[
             
          ]
       }
    }

When the same query is repeated the response is correct containing the aggregations field and the expected value.

  {
     "took" : 191,
     "timed_out" : false,
     "_shards" : {
       "total" : 756,
       "successful" : 756,
       "skipped" : 632,
       "failed" : 0
     },
     "hits" : {
       "total" : {
         "value" : 0,
         "relation" : "eq"
       },
       "max_score" : null,
       "hits" : [ ]
     },
     "aggregations" : {
       "total_settled_amount" : {
         "value" : 0.0
       },
       "unique_customer_count" : {
         "value" : 0
       }
     }
   }

I added some retries and it seems to be related with a load problem, since retrying the query without a significant delay (1s) will end in the exact same wrong result.

This is a query example

    GET cases-*/_search
    {
      "from": 0,
      "size": 1000,
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "transactionAnnouncedEvent.transaction.processedAt": {
                  "from": "2024-08-29T00:00:00.000Z",
                  "to": "2024-08-29T13:46:49.000Z",
                  "include_lower": true,
                  "include_upper": true,
                  "boost": 1
                }
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "term": {
                      "field1": {
                        "value": "abc",
                        "boost": 1
                      }
                    }
                  }
                ],
                "adjust_pure_negative": true,
                "boost": 1
              }
            }
          ],
          "adjust_pure_negative": true,
          "boost": 1
        }
      },
      "_source": {
        "includes": [
          "caseId"
        ],
        "excludes": []
      },
      "aggs": {
        "total_settled_amount": {
          "sum": {
            "field": "settledAmount"
          }
        }
      }
    }

What might be the reason of elasticsearch returning such "empty response" instead of a "429 status code?"

Thanks for reaching out, @manropinxu. Do you have an example of the Python script you are running?

What is your elastic version?

1 Like

This would be a simplified version of my script

from elasticsearch import AsyncElasticsearch

es = AsyncElasticsearch(
    url,
    max_retries=10,
    retry_on_timeout=True,
    retry_on_status=[429, 502, 503, 504],
    verify_certs=True,
    timeout=10,
)
query = {
            "track_total_hits": True,
    "size": size,
    "query": <query here>,
    "_source": "aggregations",
    "aggs":{
    "total_settled_amount": {
        "sum": {
            "field": "settledAmount"
        }
    }
}

response = await self.es.search(
    index=",".join(indices),
    body=query,
    headers={"Content-Type": "application/json", "X-Opaque-Id": opaque_id},
    ignore_unavailable=True,
    size=size,
)

# Check if the response is valid (empty shards=0 response issue)
if response.get("_shards", {}).get("total") == 0 and response.get("_shards", {}).get("total") == 0:
    raise ESNoShardsProcessedInResponseException(
        "Query failed as no shards processed in response",
        warning=True,
    )
print(response)
1 Like

This is the version

  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "unknown",
    "build_date" : "2023-12-20T06:48:26.509799Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  }
1 Like

Thanks for following up, @manropinxu. An empty response could suggest a situation where the query didn't effectively hit any shards. Have you considered adding retry logic to your Python script?

I'd at least upgrade to 7.17 to test if the behavior has changed.
Also because of all the patches (including security fixes) you are missing.

2 Likes

We have a retry mechanism, we wait 500ms and retry with a 10% backoff.

In most of cases retrying helps however issue sometimes happens even after 5 retries.
Would there be any logs/trace id we could use to debug?

Why do you need this parameter? This might explain the behavior too.

2 Likes

I need it so aggregations on missing indices aggregate to 0.
It can not explain the behavior as indices always created before. The issue could then only happen on month start, which is not only occurring then