Unable to retreive aggregation for keyword field with high cardinality (Data Table)

Hello,

I am gathering logs from one of our machines. One of our fields: error.keyword is a high cardinality field and we are doing an aggregation using it. I run the query for last 5 minutes, where there are less than 1000 documents.

Request:

{
  "aggs": {
    "2": {
      "terms": {
        "field": "error.keyword",
        "order": {
          "_count": "desc"
        },
        "size": 1
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {
    "Link": {
      "script": {
        "source": "doc['uid.keyword'].value",
        "lang": "painless"
      }
    }
  },
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2020-02-25T14:37:48.113Z",
              "lte": "2020-02-25T14:42:48.113Z"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Response:

{
  "resp": {
    "statusCode": 500,
    "error": "Internal Server Error",
    "message": "An internal server error occurred"
  }
}

Kibana UI shows an error after 30000ms
image

Kibna log entry:
{"type":"error","@timestamp":"2020-02-25T14:45:14Z","tags":[],"pid":7187,"level":"error","error":{"message":"Unexpected token u in JSON at position 0","name":"SyntaxError","stack":"SyntaxError: Unexpected token u in JSON at position 0\n at JSON.parse (<anonymous>)\n at server.route.handler (/usr/share/kibana/src/legacy/core_plugins/elasticsearch/lib/create_proxy.js:85:21)\n at process._tickCallback (internal/process/next_tick.js:68:7)"},"url":{"protocol":null,"slashes":null,"auth":null,"host":null,"port":null,"hostname":null,"hash":null,"search":"?rest_total_hits_as_int=true&ignore_unavailable=true&ignore_throttled=true&preference=1582638187172&timeout=30000ms","query":{"rest_total_hits_as_int":"true","ignore_unavailable":"true","ignore_throttled":"true","preference":"1582638187172","timeout":"30000ms"},"pathname":"/elasticsearch/prod*/_search","path":"/elasticsearch/prod*/_search?rest_total_hits_as_int=true&ignore_unavailable=true&ignore_throttled=true&preference=1582638187172&timeout=30000ms","href":"/elasticsearch/prod*/_search?rest_total_hits_as_int=true&ignore_unavailable=true&ignore_throttled=true&preference=1582638187172&timeout=30000ms"},"message":"Unexpected token u in JSON at position 0"}

There is no error in any of our Elasticsearch logs. I tried to run the exact same Request manually in Dev Tools and got an error:

{
  "statusCode": 504,
  "error": "Gateway Time-out",
  "message": "Client request timeout"
}

Any idea what might be the reason for that?

It might be the way the terms aggregation is being executed. If it uses "global ordinals" it may be performing a costly examination of all terms in the field. Because you know you have a small number of matching docs you can try set "execution_hint":"map" on the terms aggregation. This will load the string values of matching docs into RAM rather than trying to perform a global ordinals sweep to replace string values with ordinals when gathering matches

Hello Mark,

Thanks for the reply. I executed the aggregation from Dev Tools using "execution_hint": "map" as you suggested but the result is the same.

Query:

GET prod/_search
{
    "aggs": {
      "2": {
        "terms": {
          "field": "error.keyword",
          "execution_hint":"map",
          "order": {
            "_count": "desc"
          },
          "size": 1
        }
      }
    }
}

Result:

{
  "statusCode": 504,
  "error": "Gateway Time-out",
  "message": "Client request timeout"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.