How dfs_query_then_fetch works?


(Ankur Singla) #1

Hi guys

As per my understanding dfs_query_then_fetch is used to get accurate results over multiple shards.

Term aggregation returns the approximate count per term(still confused with how it calculates documents errors) .

If i use dfs_query_then_fetch search with term aggregation will it return the accurate results? I tried but it is not. I am still getting doc_count_error_upper_bound value greater than zero.

QUERY:

GET demo/_search?pretty=true&search_type=dfs_query_then_fetch
{
    "aggs" : {
        "products" : {
            "terms" : {
                "field" : "demo1.keyword",
                "size" : 5
            }
        }
    }
}

Result:

"aggregations": {
    "products": {
      "doc_count_error_upper_bound": 10959,
      "sum_other_doc_count": 10454016,
      "buckets": [
        {
          "key": "SK148",
          "doc_count": 4442
        },
        {
          "key": "SK67",
          "doc_count": 4432
        },
        {
          "key": "SK489",
          "doc_count": 4420
        },
        {
          "key": "SK592",
          "doc_count": 2245
        },
        {
          "key": "SK88",
          "doc_count": 2245
        }
      ]
    }
  }

I think i misconception here.

Please guide me on this. It will be great help.


(Mikhail Khludnev) #2

I believe dfs_query_then_fetch calculates terms' IDFs globally without
per-shard skew.


(Ankur Singla) #3

Yeah. I am assuming this means it will calculate count on whole data rather than on shard level & i should get accurate count without any doc_count_error_upper_bound or zero.

Am i right?


(Mikhail Khludnev) #4

No. I believe dfs_query_then_fetch impacts only scoring (search), but not
aggregations.


(Madhusudan Atmakuri) #5

I am not sure I understand. Based on this:

dfs_query_then_fetch should return result skipping the first phase of query_then_fetch. The documentation does not say that it does not impact aggregations.


(Mikhail Khludnev) #6

This issue is about incorrect explanation. Which is the a separate thing.

Madhusudan_Atmakuri https://discuss.elastic.co/u/madhusudan_atmakuri
January 11

I am not sure I understand. Based on this:
github.com/elastic/elasticsearch
https://github.com/elastic/elasticsearch/issues/20580
https://github.com/XGiton Issue: dfs_query_then_fetch does work
sometimes https://github.com/elastic/elasticsearch/issues/20580
opened by XGiton https://github.com/XGiton on 2016-09-20
https://github.com/elastic/elasticsearch/issues/20580
closed by jimczi https://github.com/jimczi on 2016-09-20
https://github.com/elastic/elasticsearch/issues/20580

Elasticsearch version: Elasticsearch 5.0.0-alpha5
Plugins installed: [analyzer-ik]
JVM version: jdk1.8.0_101
OS version: Ubuntu 14.04
Description of the problem including expected versus actual behavior:
dfs_query_then_fetch does not...

dfs_query_then_fetch should return result skipping the first phase of
query_then_fetch. The documentation does not say that it does not impact
aggregations.

The documentation doesn't say it does.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.