Hello,
- I am trying to do rollup on array fields but when I use terms aggregation on that field I am getting wrong data in rollup query result.
- My index template looks like this :
{ "subjects" : { "aliases" : { }, "mappings" : { "doc" : { "properties" : { "event_at" : { "type" : "date" }, "long_score" : { "type" : "long" }, "name" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "string_score" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } }
- And documents are :
{ "_index" : "subjects", "_type" : "doc", "_id" : "2", "_score" : 1.0, "_source" : { "name" : "cpp", "string_score" : [ "300" ], "long_score" : [ 300 ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "1", "_score" : 1.0, "_source" : { "name" : "c", "string_score" : [ "100", "200" ], "long_score" : [ 100, 200 ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "3", "_score" : 1.0, "_source" : { "name" : "java", "string_score" : [ ], "long_score" : [ ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "4", "_score" : 1.0, "_source" : { "name" : "php", "string_score" : [ "400", "500" ], "long_score" : [ 400, 500 ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "5", "_score" : 1.0, "_source" : { "name" : "c", "string_score" : [ "100", "200" ], "long_score" : [ 100, 200 ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "6", "_score" : 1.0, "_source" : { "name" : "c", "string_score" : [ "100", "200" ], "long_score" : [ 100, 200 ], "event_at" : "2020-01-01T01:00:00.000Z" } }, { "_index" : "subjects", "_type" : "doc", "_id" : "7", "_score" : 1.0, "_source" : { "name" : "php", "string_score" : [ "400", "500" ], "long_score" : [ 400, 500 ], "event_at" : "2020-01-01T01:00:00.000Z" } }
- And query that I ran on normal index:
GET subjects/_search { "aggs": { "name": { "terms": { "field": "name.keyword", "size": 10 } }, "string_score":{ "terms": { "field": "string_score.keyword", "size": 10 } }, "long_score":{ "terms": { "field": "long_score", "size": 10 } }, "sum_of_long_score":{ "sum": { "field": "long_score" } } } }
- result of above query :
"aggregations" : { "sum_of_long_score" : { "value" : 3000.0 }, "string_score" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "100", "doc_count" : 3 }, { "key" : "200", "doc_count" : 3 }, { "key" : "400", "doc_count" : 2 }, { "key" : "500", "doc_count" : 2 }, { "key" : "300", "doc_count" : 1 } ] }, "long_score" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 100, "doc_count" : 3 }, { "key" : 200, "doc_count" : 3 }, { "key" : 400, "doc_count" : 2 }, { "key" : 500, "doc_count" : 2 }, { "key" : 300, "doc_count" : 1 } ] }, "name" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "c", "doc_count" : 3 }, { "key" : "php", "doc_count" : 2 }, { "key" : "cpp", "doc_count" : 1 }, { "key" : "java", "doc_count" : 1 } ] } }
- rollup job configuration :
PUT _xpack/rollup/job/subjects { "index_pattern": "subjects*", "rollup_index": "ri_subjects", "cron": "0 0/1 * * * ?", "page_size": 1000, "groups": { "date_histogram": { "field": "event_at", "interval": "1d" }, "terms": { "fields": [ "string_score.keyword", "long_score", "name.keyword" ] } }, "metrics": [ { "field": "long_score", "metrics": [ "sum" ] } ] }
- rollup query that I used :
GET ri_subjects/_rollup_search?size=0 { "aggs":{ "name":{ "terms": { "field": "name.keyword", "size": 10 } }, "string_score":{ "terms": { "field": "string_score.keyword", "size": 10 } }, "long_score":{ "terms": { "field": "long_score", "size": 10 } }, "long_sum":{ "sum": { "field": "long_score" } } } }
- result of rollup query =
"aggregations" : { "string_score" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "100", "doc_count" : 6 }, { "key" : "200", "doc_count" : 6 }, { "key" : "400", "doc_count" : 4 }, { "key" : "500", "doc_count" : 4 }, { "key" : "300", "doc_count" : 1 } ] }, "long_score" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 100, "doc_count" : 6 }, { "key" : 200, "doc_count" : 6 }, { "key" : 400, "doc_count" : 4 }, { "key" : 500, "doc_count" : 4 }, { "key" : 300, "doc_count" : 1 } ] }, "name" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "c", "doc_count" : 12 }, { "key" : "php", "doc_count" : 8 }, { "key" : "cpp", "doc_count" : 1 }, { "key" : "java", "doc_count" : 1 } ] }, "long_sum" : { "value" : 11100.0 } }
-
As we can see the counts of buckets from result of both normal and rollup queries with keys 100, 200, 400, 500 do not match. In normal index query we got "key:doc_count" as 100: 3, 200:3, 400:2, 500:2 and in rollup_search query we got 100:6, 200:6, 400:4,500:4.
-
so my question is, is there any way to get doc-count in terms aggs of rollup query as same as normal query for array fields? or am doing something wrong here? any help would be appreciated. thanks in advance!