Query and Aggregation result doesn't match

Hi team, got a strange issue when doing query and aggregation. Here is my script

GET userindex/_search
{ "_source":["response4"],
  "query":{
    "bool": {
      "filter": [
        {"script": {
          "script": {
            "source": """
              return doc['category.keyword'].value == 'StudioUI' && doc['id.keyword'].value == '8a07b6be-c9c9-4002-b989-a4ca67ef51d2'
              """,
              "lang": "painless"
          }
        }
        }
      ]
  }},
  "aggs": {
    "FieldAff": {
      "terms": {
        "field": "response4.keyword",
        "size": 10000,
        "show_term_doc_count_error": true,
        "missing": "Missing"
      }
    }
  }
}

the result I got is

{
  "took" : 214,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "userindex",
        "_type" : "_doc",
        "_id" : "8a07b6be-c9c9-4002-b989-a4ca67ef51d2",
        "_score" : 0.0,
        "_routing" : "8a07b6be-c9c9-4002-b989-a4ca67ef51d2",
        "_source" : {
          "response4" : """Something went wrong
We encountered an error while preparing to depldpoint. You can get more details below.

An error occurred (Resed) when calling the Creaoint operation: The aunt-leel service limit 'ml.p3.2xle for endpage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Pleice Quotas to request an increase for thota. If Available, contapport to reque"""
        }
      }
    ]
  },
  "aggregations" : {
    "FieldAff" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Missing",
          "doc_count" : 1,
          "doc_count_error_upper_bound" : 0
        }
      ]
    }
  }
}

From the query result, we can see there is “response4” field value, but in the aggregation result, this is marked as “Missing”. Could anyone help explain this? thanks

It looks like there is no response4.keyword value indexed in the one document that was returned in your query, so it shows up in the "missing" key.

Can you explain a little more about what you're trying to accomplish here? Doing a terms aggregation on a text value probably isn't very helpful, which is what your query is attempting to accomplish.

Thank you for the response. I try to find the max length of response4. So think of one approach is doing stats aggregation on [response4.keyword].value.length(), as in our mapping [response4] is not aggregable and only [response4.keyword] is aggregable. Both fields hold the same value. When running script, no error returned and I got the max, min, sum... values for field. But latter I noticed some of the [response4.keyword] values are not included in the aggregation result. After checking I found both response4 and response4.keyword fields are populated with the same value for those docs. But those records are with longer string [response4.keyword] values. Also used the script I posted above, I confirmed no aggregation can be done on those records. So kind of confused why not able to include those records.

If you check the mapping for this field, it's likely you'll see something like the following:

"response4": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }

What this means is that if the value is longer than the ignore_above length, the keyword version of the field will not be indexed. You can find more information about this property here: ignore_above | Elasticsearch Guide [8.11] | Elastic

This is likely why you're not able to use the longer values in aggregations.

Instead of using the keyword, you might want to just retrieve the value from _source in your script:

emit(params['_source'].response4.length())

Could you give that a try and see if it works for you?

Thank you so much. Checked our mapping, it is with ignore_above: 256. Guess here is to suggest using runtime field. As our ES version is < 7.12, this feature is not available. But following your suggestion with idea doing max on params['_source'][response4].length() solve the problem.

GET userIndex/_search
{ "_source": ["response4"], 
   "query": {
     "exists": { 
       "field": "response4"
      }
   },
   "aggs": {
      "maxLen": {
         "max": {
            "script" : "if ( ! (params['_source']['response3'] instanceof List)) {return params['_source']['response3'].length();} else {return 1;}"
         }
      }
   }
}

Appreciate your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.