Min doc sub aggregation (find duplicates)

champtar · September 9, 2017, 12:04am

Hi,

I'm doing this query

GET _search
{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-365d/d",
        "to": "now-10m/m"
      }
    }
  },
  "aggs": {
    "timestamp": {
      "terms": {
        "field": "@timestamp",
        "size": 10,
        "order": {
          "_term": "asc"
        }
      },
      "aggs": {
        "k": {
          "terms": {
            "field": "k",
            "size": 10000
          },
          "aggs": {
            "v": {
              "terms": {
                "field": "v",
                "size": 10000,
                "min_doc_count": 2
              }
            }
          }
        }
      }
    }
  }
}

And I get this response

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 83047,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "timestamp": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 82939,
      "buckets": [
        {
          "key": 1488163800000,
          "key_as_string": "1488163800000",
          "doc_count": 4,
          "k": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "request_type",
                "doc_count": 3,
                "v": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": []
                }
              },
              {
                "key": "stream_protocol",
                "doc_count": 1,
                "v": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": []
                }
              }
            ]
          }
        },
        {
          "key": 1488163860000,
          "key_as_string": "1488163860000",
          "doc_count": 8,
          "k": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "request_type",
                "doc_count": 5,
                "v": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "vod",
                      "doc_count": 3
                    }
                  ]
                }
              },
...

What I would like is not have timestamp = 1488163800000 because the sub sub aggregations are empty
I'm trying to find duplicates without using scripts, would it be possible to filter out timestamp if there is no duplicates in timestamp>k>v ?

Else what I'm looking for is the 10 oldest {timestamp, k, v} that are duplicates

Thanks
Etienne

system · October 7, 2017, 12:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Aggregation (Bug?) - No buckets results at high "min_doc_count" and low "size" Elasticsearch	2	592	September 19, 2017
Nested terms aggregations - min_doc_count isn't returning key-wise(field in terms) 0 for empty buckets Elasticsearch	1	320	September 3, 2019
Min_doc_count on lower/lowest level nested aggregation Elasticsearch	1	534	July 6, 2017
Questions about aggregation min_doc_count = 0 Elasticsearch	3	1790	July 6, 2017
Need help with Terms Aggregation : buckets count Elasticsearch	1	325	May 12, 2020

Min doc sub aggregation (find duplicates)

Related topics