Hi
I am facing issue in aggregation withing aggregation as results count does not match with total hits, but when I search aggregation on single record without second level aggregation then shows correct counts.
I have created indices as
- analytics-live-2021-06
- analytics-live-2021-07
- analytics-live-2021-08
- analytics-live-2021-09
All indices are differentiated by _type as impression,view,call,lead etc
And documents in all indices are like:
{
"_index": "analytics-live-2021-07",
"_type": "impression",
"_id": "AXqH8j2vAbeLubSJF1JN",
"_score": 12.767446,
"_source": {
"event_source": "android",
"uid": "9100483",
"entity_type": "node",
"url": "search_screen",
"bundle": "classified",
"entity_id": "45870510",
"entity_uid": "xxx",
"entity_url": "xyz/45870510",
"drupal": 7,
"ip": "37.208.130.216",
"user_agent": "Dalvik/2.1.0 (Linux; U; Android 10; Redmi Note 9 Pro MIUI/V12.0.3.0.QJZMIXM)",
"country": "QA",
"@timestamp": "2021-07-09T00:07:17"
}
},
{
"_index": "analytics-test-2021-09-new",
"_type": "view",
"_id": "AXugmm1ZoEhud55w2U8X",
"_score": 1,
"_source": {
"event_source": "android",
"uid": "0",
"entity_type": "node",
"url": "ad_detail_screen",
"bundle": "classified",
"entity_id": 45872435,
"entity_uid": "xxx",
"entity_url": "xyz/45872435",
"drupal": 7,
"ip": "2402:8100:2842:2460::f903:ee5",
"user_agent": "Dalvik/2.1.0 (Linux; U; Android 11; CPH2121 Build/RP1A.200720.011)",
"country": "IN",
"@timestamp": "2021-09-01T12:04:40"
}
},
There are total 1488349910 documents in all indices.
Now if run query for single entity_id field to get bucket aggregation of _type as:
GET analytics-live-2021*/_search
{
"size":0,
"query":{
"bool":{
"must":{
"terms":{
"entity_id":[45870510]
}
},
"filter":{
"range":{
"@timestamp":{"gt":"2021-06-01T00:00:00"}
}
}
}
},
"aggs":{
"_type":{
"terms":{"field":"_type"}
}
}
}
It gives following results which is accurate:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8193,
"max_score": 0,
"hits": []
},
"aggregations": {
"_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 8148
},
{
"key": "view",
"doc_count": 42
},
{
"key": "reveal",
"doc_count": 3
}
]
}
}
}
Entity ID: 45870510
Total Hits: 8193
Aggregation: impression: 8148, view: 42, reveal: 3 === Total 8193
Issue:
But now I want to get same aggregation per entity_id but it shows incorrect results count and comes with the huge difference
Query
GET analytics-live-2021*/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"terms": {
"entity_id": [
45870510,45327508
]
}
},
"filter": {
"range": {
"@timestamp": {
"gt": "2021-06-01T00:00:00"
}
}
}
}
},
"aggs": {
"_type": {
"terms": {
"field": "entity_id.keyword"
},
"aggs": {
"_type": {
"terms": {
"field": "_type"
}
}
}
}
}
}
Results with wrong count as compare to total hits
{
"took": 871,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11498,
"max_score": 0,
"hits": []
},
"aggregations": {
"_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "45870510",
"doc_count": 2292,
"_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 2280
},
{
"key": "view",
"doc_count": 12
}
]
}
},
{
"key": "45327508",
"doc_count": 1009,
"_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "impression",
"doc_count": 975
},
{
"key": "view",
"doc_count": 32
},
{
"key": "reveal",
"doc_count": 2
}
]
}
}
]
}
}
}
Entity ID: 45870510
Aggregation: impression: 2280, view: 12, reveal: xx === Total 2292
Please help me to get correct results against each entity_id in multilevel aggregation
I need aggregated of _type field for each entity_id with actual count
Khurram Fraz