Use dfs_query_then_fetch with aggregation


(Ankur Singla) #1

I am trying to use term aggregation with dfs_query_then_fetch in order to get accurate values but still i am not getting accurate count.

QUERY:

GET demo/_search?pretty=true&search_type=dfs_query_then_fetch
{
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "demo1.keyword",
                "size" : 5
            }
        }
    }
}

Result:

"aggregations": {
    "products": {
      "doc_count_error_upper_bound": 10959,
      "sum_other_doc_count": 10454016,
      "buckets": [
        {
          "key": "SK148",
          "doc_count": 4442
        },
        {
          "key": "SK67",
          "doc_count": 4432
        },
        {
          "key": "SK489",
          "doc_count": 4420
        },
        {
          "key": "SK592",
          "doc_count": 2245
        },
        {
          "key": "SK88",
          "doc_count": 2245
        }
      ]
    }
  }

I believe while using dfs_query_then_fetch with term aggregation i will get doc_count_error_upper_bound with zero value.


(Simon Willnauer) #2

no term_aggregations won't change it's accuracy if you use DFS this only applies to term statistics for fulltext search.


(Ankur Singla) #3

Is there any way to get accurate term_aggregations?


(Mark Harwood) #4

Your request is for the top 5 terms. I'm not clear how many shards you have (that bit is missing from the results you shared).

Each shard returns a multiple of 5 for the number of top-results with the hope that this provides sufficient results to arrive at an overall accurate top 5. It is possible however that the default selection is not large enough to deliver a fully accurate response and this is reflected in the doc_count_error_upper_bound response . If you manually increase the selection size using the shard_size setting you can reduce the error bounds to a desirable level (eg zero error).


(Ankur Singla) #5

I have 6 shards but that is just for the example my actual purpose is to get accurate term_aggregation. Top 5 is a variable it can be top 10, top 2 like that.


(Mark Harwood) #6

My advice on shard_size still stands for any given choice of size


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.