Aggregations count vs hits count

Hi
I would like to concern on aggregations count for explain in more details
But for the sake of presenting the case a little background :
I have cluster contains with 3 master nodes, 3 ingest nodes, 3 data nodes so these data nodes are managed index with 3 shards (each of shards has ~10GB) this index is fed up with data through side car app like reindexer.

I saw that many of documents are rebuild (by delete and create the new one) in this time it can be observed increase of segments.
Which is not demanded for search performance.
Whether we can suspect that wrong aggregation count could have been caused by not synced data between shards/replicas??? if we have a lot of segments?

In general in our case we have an issue with aggregation count vs hits count.

Is it possible that aggregations count was different depending which shards were processing request (even when total hits count was correct in each response) ?

If the index has a lot of segments with deleted documents how the aggregation count should be behave?

What are the exceptions when we get a different count of aggregation from count of hits.

I've also saw below doc but it doesn't have any ref to this case.

Hello @NNI and welcome,

From the number of nodes you have in your cluster + size of the shards, I can conclude that your cluster is not big and can handle the data you have. But I have a question, do you have 3 master dedicated nodes? same for ingest/data nodes, are they dedicated? If so meaning your cluster has 9 nodes in total. If not are some nodes have multi-roles? Like data nodes act as ingest nodes as well?

Based on my understanding and from the link you provided, did you try the "show_term_doc_count_error" field in count aggregation? Because based on the doc the count aggregation results can be approximated if your cluster has larger size shard, which is not the case for you (you said you have 10gb / shard).

If you can provide query results to show the difference that would help maybe.

Also, if you can try to set the index.refresh_interval, like:

PUT /my-index-000001/_settings
{
  "index" : {
    "refresh_interval" : "1s"
  }
}

if you have a datastream check this.

Marwane.

Exactly, this is a cluster build on 9 nodes (I don't have a nodes with multi-role)

Ingest nodes are placed separately and it is through them that data is loaded
but the query environment is configured so that search requests go directly to data nodes

=> index.refresh_interval it has been already set for 60s

the sample query results:

{
    "totalResults": 16,
    "hasNextPage": false,
    "requestProcessingTime": 87,
    "products": [
        {
            "id": "Para_e_45",
           

            ...

            "categoryLeaves": [
                {
                    "id": "_91916",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_50",
 

            ...

            "categoryLeaves": [
                {
                    "id": "_91914",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_51",

            ...

            "categoryLeaves": [
                {
                    "id": "_91914",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_516",

            ...

            "categoryLeaves": [
                {
                    "id": "_91914",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_510",

           ...

            "categoryLeaves": [
                {
                    "id": "_91914",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_543",

            ...

            "categoryLeaves": [
                {
                    "id": "_91937",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_549",

           ...

            "categoryLeaves": [
                {
                    "id": "_91937",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_511",

            ...

            "categoryLeaves": [
                {
                    "id": "_91937",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_521",

           ...

            "categoryLeaves": [
                {
                    "id": "_91937",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_531",

            ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_541",

            ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_551",

            ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_555",

            ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_562",

            ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_565",

           ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        },
        {
            "id": "Para_e_560",

          ...

            "categoryLeaves": [
                {
                    "id": "_91926",
                    "isPrimary": true
                }
            ]
        }
    ],
    "facets": {
        "categories": {
            "id": "SalesID6011088",
            "label": "Tires",
            "type": "hierarchical",
            "level": 2,
            "isLeaf": false,
            "children": [
                {
                    "id": "SalesID6012410",
                    "label": "Tires Cars”,
                    "type": "hierarchical",
                    "level": 3,
                    "isLeaf": false,
                    "children": [
                        {
                            "id": "SalesID6011093",
                            "label": "Tires SUV/4x4",
                            "type": "hierarchical",
                            "level": 4,
                            "isLeaf": false,
                            "children": [],
                            "selected": false,
                            "count": 7
                        },
                        {
                            "id": "SalesID6012439",
                            "label": "Delivery tire",
                            "type": "hierarchical",
                            "level": 4,
                            "isLeaf": false,
                            "children": [],
                            "selected": false,
                            "count": 4
                        },
                        {
                            "id": "SalesID6011089",
                            "label": "Tires passenger",
                            "type": "hierarchical",
                            "level": 4,
                            "isLeaf": false,
                            "children": [],
                            "selected": false,
                            "count": 2
                        }
                    ],
                    "selected": false,
                    "count": 13
                }
            ],
            "selected": false,
            "count": 13
        }
    }
}

I still not understanding what you mean based on the query sample you post, can you elaborate it more please? and post the query as well?

But I think you mean when you do an aggregation there are the doc count aggregation for each bucket and hit count at the top, for example, this term aggregation:

GET filebeat-8.7.0/_search
{
  "size": 0,
  "aggs": {
    "test": {
      "terms": {
        "field": "agent.hostname",
        "size": 10
      }
    }
  }
}

will result in this:

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,  # <---- the total hit count for documents that match the aggregation
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "test": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 268556,
      "buckets": [
        {
          "key": "myhost1",
          "doc_count": 130474  # <--- total documents inside the bucket "myhost1"
        },
        {
          "key": "myhost2",
          "doc_count": 57297
        }
      #...
      ]
    }
  }
}

the hit count total which is greater than 10000 means that the number of document applied to this aggregation is more than 10000 documents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.