Aggregations on Unique Documents by field per day

baahubali · February 23, 2022, 9:54am

We can find the Average of fields of all of the docs by using simple Metric. Great!

We can find the Average of Averages per day using Average Bucket. Great!

But how do we find Average of fields in single unique document by a field for a day?

For example, assume we have these documents with fields code, price and date:

{ code: BTC, price: 30000, date: 23/02/2022 },
{ code: BTC, price: 30000, date: 23/02/2022 },
{ code: ETH, price: 5000, date: 23/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: ETH, price: 7000, date: 24/02/2022 }

Now, my issue is that I have duplicate entries for code for each day. When I am finding averages, I want to count a code only once for each day.

So the idea is to calculate averages of price for unique codes per day.

I cannot figure this out using Bucket Pipelines. I need a bucket with unique (average, min or max) values for each day and then calculate its averages.

So, in the example above, for code: BTC, I do not need:

average = (30000 + 30000 + 30000 + 50000 + 50000)/5 = 38000

But I need the average to be:

average = (30000 + 50000)/2 = 40000

Tomo_M · February 23, 2022, 10:38am

Hi,

Here is sample query.
You may need avg_bucket aggregation after avg aggregation.

PUT /test_btc/

POST /test_btc/_bulk
{"index":{}}
{"code": "BTC", "price": 30000, "date": "2022-02-23"}
{"index":{}}
{"code": "BTC", "price": 30000, "date": "2022-02-23" }
{"index":{}}
{ "code": "ETH", "price": 5000, "date": "2022-02-23" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "ETH", "price": 7000, "date": "2022-02-24" }

GET /test_btc/_search
{
  "size":0,
  "aggs":{
    "code":{
      "terms":{
        "field": "code.keyword"
      },
      "aggs": {
        "date": {
          "date_histogram": {
            "field": "date",
            "calendar_interval": "day"
          },
          "aggs":{
            "avg_code_date":{
              "avg": {
                "field": "price"
              }
            }
          }
        },
        "avg_code":{
          "avg_bucket": {
            "buckets_path": "date>avg_code_date"
          }
        }
      }
    }
  }
}

But if it is truly duplicated, deduplicate such documents at indexing time could be more simple way.

baahubali · February 23, 2022, 11:06am

They differ by few fields.

Thank you for the query. Is this possible in Visualization on UI? I knew similar methods in Time-Series too but I need to keep all my Visualization and change the calculations - right now they are taking all codes and not the unique per day.

Tomo_M · February 23, 2022, 11:38am

I have no idea to visualize the query directly other than using vega/vega-lite. But if it were up to me, I would create transform and visualize the dest index.

Something like:

GET /_transform/_preview
{
  "pivot":{
    "aggs":{
      "price":{
        "avg":{
          "field":"price"
        }
      }
    },
    "group_by":{
      "code":{
        "terms":{
          "field":"code.keyword"
        }
      },
      "date": {
        "date_histogram": {
          "field": "date",
          "calendar_interval": "day"
        }
      }
    }
  },
  "source":{
    "index":"test_btc"
  }
}

Dest index should be:

{
  "date" : "2022-02-23T00:00:00.000Z",
  "code" : "BTC",
  "price" : 30000.0
},
{
  "date" : "2022-02-23T00:00:00.000Z",
  "code" : "ETH",
  "price" : 5000.0
},
{
  "date" : "2022-02-24T00:00:00.000Z",
  "code" : "BTC",
  "price" : 50000.0
},
{
  "date" : "2022-02-24T00:00:00.000Z",
  "code" : "ETH",
  "price" : 7000.0
}

system · March 23, 2022, 11:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to bucket aggregate on document without any numeric field Kibana	16	489	March 20, 2022
Excluding Duplicate Values in Kibana Filter Kibana	3	11218	September 11, 2019
Aggregate 7 fields and get the sum of another field Elasticsearch	5	377	February 12, 2021
Sum aggregation along with Unique Count aggregation Kibana	5	5401	July 6, 2017
Average sum calculation in kibana Kibana	4	3164	July 6, 2017

Aggregations on Unique Documents by field per day

Related topics