Rollup terms aggregation giving wrong result with array fields

Hello,

  1. I am trying to do rollup on array fields but when I use terms aggregation on that field I am getting wrong data in rollup query result.
  2. My index template looks like this :
{
          "subjects" : {
            "aliases" : { },
            "mappings" : {
              "doc" : {
                "properties" : {
                  "event_at" : {
                    "type" : "date"
                  },
                  "long_score" : {
                    "type" : "long"
                  },
                  "name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "string_score" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  }
                }
              }
            }
  1. And documents are :
{
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "cpp",
          "string_score" : [
            "300"
          ],
          "long_score" : [
            300
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "c",
          "string_score" : [
            "100",
            "200"
          ],
          "long_score" : [
            100,
            200
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "java",
          "string_score" : [ ],
          "long_score" : [ ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "php",
          "string_score" : [
            "400",
            "500"
          ],
          "long_score" : [
            400,
            500
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "name" : "c",
          "string_score" : [
            "100",
            "200"
          ],
          "long_score" : [
            100,
            200
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "name" : "c",
          "string_score" : [
            "100",
            "200"
          ],
          "long_score" : [
            100,
            200
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      },
      {
        "_index" : "subjects",
        "_type" : "doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "name" : "php",
          "string_score" : [
            "400",
            "500"
          ],
          "long_score" : [
            400,
            500
          ],
          "event_at" : "2020-01-01T01:00:00.000Z"
        }
      }
  1. And query that I ran on normal index:
GET subjects/_search
    {
      "aggs": {
        "name": {
          "terms": {
            "field": "name.keyword",
            "size": 10
          }
        },
        "string_score":{
          "terms": {
            "field": "string_score.keyword",
            "size": 10
          }
        },
        "long_score":{
          "terms": {
            "field": "long_score",
            "size": 10
          }
        },
        "sum_of_long_score":{
          "sum": {
            "field": "long_score"
          }
        }
      }
    }
  1. result of above query :
"aggregations" : {
    "sum_of_long_score" : {
      "value" : 3000.0
    },
    "string_score" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "100",
          "doc_count" : 3
        },
        {
          "key" : "200",
          "doc_count" : 3
        },
        {
          "key" : "400",
          "doc_count" : 2
        },
        {
          "key" : "500",
          "doc_count" : 2
        },
        {
          "key" : "300",
          "doc_count" : 1
        }
      ]
    },
    "long_score" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 100,
          "doc_count" : 3
        },
        {
          "key" : 200,
          "doc_count" : 3
        },
        {
          "key" : 400,
          "doc_count" : 2
        },
        {
          "key" : 500,
          "doc_count" : 2
        },
        {
          "key" : 300,
          "doc_count" : 1
        }
      ]
    },
    "name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "c",
          "doc_count" : 3
        },
        {
          "key" : "php",
          "doc_count" : 2
        },
        {
          "key" : "cpp",
          "doc_count" : 1
        },
        {
          "key" : "java",
          "doc_count" : 1
        }
      ]
    }
  }
  1. rollup job configuration :
PUT _xpack/rollup/job/subjects
{
  "index_pattern": "subjects*",
  "rollup_index": "ri_subjects",
  "cron": "0 0/1 * * * ?",
  "page_size": 1000,
  "groups": {
    "date_histogram": {
      "field": "event_at",
      "interval": "1d"
    },
    "terms": {
      "fields": [
        "string_score.keyword",
        "long_score",
        "name.keyword"
      ]
    }
  },
  "metrics": [
    {
      "field": "long_score",
      "metrics": [
        "sum"
      ]
    }
  ]
}
  1. rollup query that I used :
GET ri_subjects/_rollup_search?size=0
{
  "aggs":{
    "name":{
      "terms": {
        "field": "name.keyword",
        "size": 10
      }
    },
    "string_score":{
      "terms": {
        "field": "string_score.keyword",
        "size": 10
      }
    },
    "long_score":{
      "terms": {
        "field": "long_score",
        "size": 10
      }
    },
    "long_sum":{
      "sum": {
        "field": "long_score"
      }
    }
  }
}
  1. result of rollup query =
"aggregations" : {
    "string_score" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "100",
          "doc_count" : 6
        },
        {
          "key" : "200",
          "doc_count" : 6
        },
        {
          "key" : "400",
          "doc_count" : 4
        },
        {
          "key" : "500",
          "doc_count" : 4
        },
        {
          "key" : "300",
          "doc_count" : 1
        }
      ]
    },
    "long_score" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 100,
          "doc_count" : 6
        },
        {
          "key" : 200,
          "doc_count" : 6
        },
        {
          "key" : 400,
          "doc_count" : 4
        },
        {
          "key" : 500,
          "doc_count" : 4
        },
        {
          "key" : 300,
          "doc_count" : 1
        }
      ]
    },
    "name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "c",
          "doc_count" : 12
        },
        {
          "key" : "php",
          "doc_count" : 8
        },
        {
          "key" : "cpp",
          "doc_count" : 1
        },
        {
          "key" : "java",
          "doc_count" : 1
        }
      ]
    },
    "long_sum" : {
      "value" : 11100.0
    }
  } 
  1. As we can see the counts of buckets from result of both normal and rollup queries with keys 100, 200, 400, 500 do not match. In normal index query we got "key:doc_count" as 100: 3, 200:3, 400:2, 500:2 and in rollup_search query we got 100:6, 200:6, 400:4,500:4.

  2. so my question is, is there any way to get doc-count in terms aggs of rollup query as same as normal query for array fields? or am doing something wrong here? any help would be appreciated. thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.