Multilevel aggregation does not show correct results as compare to total hits

Hi

I am facing issue in aggregation withing aggregation as results count does not match with total hits, but when I search aggregation on single record without second level aggregation then shows correct counts.

I have created indices as

  • analytics-live-2021-06
  • analytics-live-2021-07
  • analytics-live-2021-08
  • analytics-live-2021-09

All indices are differentiated by _type as impression,view,call,lead etc

And documents in all indices are like:

{
        "_index": "analytics-live-2021-07",
        "_type": "impression",
        "_id": "AXqH8j2vAbeLubSJF1JN",
        "_score": 12.767446,
        "_source": {
          "event_source": "android",
          "uid": "9100483",
          "entity_type": "node",
          "url": "search_screen",
          "bundle": "classified",
          "entity_id": "45870510",
          "entity_uid": "xxx",
          "entity_url": "xyz/45870510",
          "drupal": 7,
          "ip": "37.208.130.216",
          "user_agent": "Dalvik/2.1.0 (Linux; U; Android 10; Redmi Note 9 Pro MIUI/V12.0.3.0.QJZMIXM)",
          "country": "QA",
          "@timestamp": "2021-07-09T00:07:17"
        }
      },
      {
        "_index": "analytics-test-2021-09-new",
        "_type": "view",
        "_id": "AXugmm1ZoEhud55w2U8X",
        "_score": 1,
        "_source": {
          "event_source": "android",
          "uid": "0",
          "entity_type": "node",
          "url": "ad_detail_screen",
          "bundle": "classified",
          "entity_id": 45872435,
          "entity_uid": "xxx",
          "entity_url": "xyz/45872435",
          "drupal": 7,
          "ip": "2402:8100:2842:2460::f903:ee5",
          "user_agent": "Dalvik/2.1.0 (Linux; U; Android 11; CPH2121 Build/RP1A.200720.011)",
          "country": "IN",
          "@timestamp": "2021-09-01T12:04:40"
        }
      },

There are total 1488349910 documents in all indices.

Now if run query for single entity_id field to get bucket aggregation of _type as:

GET analytics-live-2021*/_search
{
  "size":0,
  "query":{
    "bool":{
      "must":{
        "terms":{
          "entity_id":[45870510]
        }
      },
      "filter":{
        "range":{
          "@timestamp":{"gt":"2021-06-01T00:00:00"}
          
        }
      }
    }
  },
  "aggs":{
    "_type":{
      "terms":{"field":"_type"}
    }
    
  }
}

It gives following results which is accurate:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 20,
    "successful": 20,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8193,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "_type": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "impression",
          "doc_count": 8148
        },
        {
          "key": "view",
          "doc_count": 42
        },
        {
          "key": "reveal",
          "doc_count": 3
        }
      ]
    }
  }
}

Entity ID: 45870510
Total Hits: 8193
Aggregation: impression: 8148, view: 42, reveal: 3 === Total 8193

Issue:

But now I want to get same aggregation per entity_id but it shows incorrect results count and comes with the huge difference

Query

GET analytics-live-2021*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": {
        "terms": {
          "entity_id": [
            45870510,45327508
          ]
        }
      },
      "filter": {
        "range": {
          "@timestamp": {
            "gt": "2021-06-01T00:00:00"
          }
        }
      }
    }
  },
  "aggs": {
    "_type": {
      "terms": {
        "field": "entity_id.keyword"
      },
      "aggs": {
        "_type": {
          "terms": {
            "field": "_type"
          }
        }
      }
    }
  }
}

Results with wrong count as compare to total hits

{
  "took": 871,
  "timed_out": false,
  "_shards": {
    "total": 20,
    "successful": 20,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11498,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "_type": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "45870510",
          "doc_count": 2292,
          "_type": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "impression",
                "doc_count": 2280
              },
              {
                "key": "view",
                "doc_count": 12
              }
            ]
          }
        },
        {
          "key": "45327508",
          "doc_count": 1009,
          "_type": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "impression",
                "doc_count": 975
              },
              {
                "key": "view",
                "doc_count": 32
              },
              {
                "key": "reveal",
                "doc_count": 2
              }
            ]
          }
        }
      ]
    }
  }
}

Entity ID: 45870510
Aggregation: impression: 2280, view: 12, reveal: xx === Total 2292

Please help me to get correct results against each entity_id in multilevel aggregation

I need aggregated of _type field for each entity_id with actual count

Khurram Fraz

Is this for Enterprise Search or Elasticsearch?

This is for elasticsearch

Ok let me move it for you (ps - there is a pretty big warning when you make a new post that category is not for Elasticsearch stuff :wink: ).

Thanks for your reply, I could not find proper category thats why I put here

That is curious.
What version of elasticsearch are you on?
Is entity_id mapped as a numeric field (long/integer etc)? If so does it behave differently if the search is on the entity_id.keyword field?

I am using Amazon Elasticsearch service version 5.6

entity_id is actually a numeric field and in many documents it is saved as text. so it is mixed text and number.

search with entity_id Or entity_id.keyword gives different results but not actual results. entity_id gives more results than entity_id.keyword.

That's quite an old version (released 4 years ago) so may well have issues that have been fixed already.

Showing the related JSON for mappings and documents for your example would help dig deeper.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.