Aggregations on Unique Documents by field per day

We can find the Average of fields of all of the docs by using simple Metric. Great!

We can find the Average of Averages per day using Average Bucket. Great!

But how do we find Average of fields in single unique document by a field for a day?

For example, assume we have these documents with fields code, price and date:

{ code: BTC, price: 30000, date: 23/02/2022 },
{ code: BTC, price: 30000, date: 23/02/2022 },
{ code: ETH, price: 5000, date: 23/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: BTC, price: 50000, date: 24/02/2022 },
{ code: ETH, price: 7000, date: 24/02/2022 }

Now, my issue is that I have duplicate entries for code for each day. When I am finding averages, I want to count a code only once for each day.

So the idea is to calculate averages of price for unique codes per day.

I cannot figure this out using Bucket Pipelines. I need a bucket with unique (average, min or max) values for each day and then calculate its averages.

So, in the example above, for code: BTC, I do not need:

average = (30000 + 30000 + 30000 + 50000 + 50000)/5 = 38000

But I need the average to be:

average = (30000 + 50000)/2 = 40000

Hi,

Here is sample query.
You may need avg_bucket aggregation after avg aggregation.

PUT /test_btc/

POST /test_btc/_bulk
{"index":{}}
{"code": "BTC", "price": 30000, "date": "2022-02-23"}
{"index":{}}
{"code": "BTC", "price": 30000, "date": "2022-02-23" }
{"index":{}}
{ "code": "ETH", "price": 5000, "date": "2022-02-23" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "BTC", "price": 50000, "date": "2022-02-24" }
{"index":{}}
{ "code": "ETH", "price": 7000, "date": "2022-02-24" }

GET /test_btc/_search
{
  "size":0,
  "aggs":{
    "code":{
      "terms":{
        "field": "code.keyword"
      },
      "aggs": {
        "date": {
          "date_histogram": {
            "field": "date",
            "calendar_interval": "day"
          },
          "aggs":{
            "avg_code_date":{
              "avg": {
                "field": "price"
              }
            }
          }
        },
        "avg_code":{
          "avg_bucket": {
            "buckets_path": "date>avg_code_date"
          }
        }
      }
    }
  }
}

But if it is truly duplicated, deduplicate such documents at indexing time could be more simple way.

They differ by few fields.

Thank you for the query. Is this possible in Visualization on UI? I knew similar methods in Time-Series too but I need to keep all my Visualization and change the calculations - right now they are taking all codes and not the unique per day.

I have no idea to visualize the query directly other than using vega/vega-lite. But if it were up to me, I would create transform and visualize the dest index.

Something like:

GET /_transform/_preview
{
  "pivot":{
    "aggs":{
      "price":{
        "avg":{
          "field":"price"
        }
      }
    },
    "group_by":{
      "code":{
        "terms":{
          "field":"code.keyword"
        }
      },
      "date": {
        "date_histogram": {
          "field": "date",
          "calendar_interval": "day"
        }
      }
    }
  },
  "source":{
    "index":"test_btc"
  }
}

Dest index should be:

{
  "date" : "2022-02-23T00:00:00.000Z",
  "code" : "BTC",
  "price" : 30000.0
},
{
  "date" : "2022-02-23T00:00:00.000Z",
  "code" : "ETH",
  "price" : 5000.0
},
{
  "date" : "2022-02-24T00:00:00.000Z",
  "code" : "BTC",
  "price" : 50000.0
},
{
  "date" : "2022-02-24T00:00:00.000Z",
  "code" : "ETH",
  "price" : 7000.0
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.