Average per day of week aggregation

atopalov · March 15, 2018, 3:35pm

Hello,

I have a message index with a 'date' and 'channelId' fields:

POST message/_doc
{
	"channelId": "channel_twitter", 
	"date": "2018-03-08T14:01:23.385Z"
}

I need to create an aggregation that calculates the average number of messages per day of the week per channel. For example - for twitter - an average of 5 messages on Mondays, 10 messages on Tuesdays etc.
Looking at the day of week example in the Date Histogram documentation, I managed to get the total number of messages per channel per day of week:

{
  "size": 0,
  "aggs": {
    "messages_per_channel": {
      "terms": {
        "field": "channelId"
      },
      "aggregations": {
        "total_messages_per_day_of_week": {
          "terms": {
            "script": {
              "lang": "painless",
              "source": "doc['date'].value.dayOfWeek"
            }
          }
        }
      }
    }
  }
}

but I can't figure out how to calculate an average based on the total number of Mondays, Tuesdays etc. since the first message.

abdon · March 19, 2018, 12:45pm

You could use pipeline aggregations for this. Pipeline aggregations are aggregations that work on the result of other aggregations.

In this case, I would first create a weekly histogram inside of your total_messages_per_day_of_week aggregation. This will create a bucket per week of data, and the number of buckets in this histogram corresponds to the total number of Mondays, Tuesdays etc.

Next, I would use a bucket_script aggregation to divide the total messages per week day by the number of weeks of data. The bucket_script aggregation can calculate a metric based on the outputs of other aggregations by referring to those aggregations by their paths. It has access to special paths like _count (which can be used to get the total number of messages) and _bucket_count (to get the number of weeks). Dividing one by the other gets you what you want.

Putting it all together, the request would look like this:

{
  "size": 0,
  "aggs": {
    "messages_per_channel": {
      "terms": {
        "field": "channelId"
      },
      "aggs": {
        "total_messages_per_day_of_week": {
          "terms": {
            "script": {
              "lang": "painless",
              "source": "doc['date'].value.dayOfWeek"
            }
          },
          "aggs": {
            "number_of_weeks": {
              "date_histogram": {
                "field": "date",
                "interval": "week"
              }
            },
            "average_messages_per_day_of_week": {
              "bucket_script": {
                "buckets_path": {
                  "doc_count": "_count",
                  "number_of_weeks": "number_of_weeks._bucket_count"
                },
                "script": "params.doc_count / params.number_of_weeks"
              }
            }
          }
        }
      }
    }
  }
}

atopalov · March 20, 2018, 1:21pm

Works beautifully. Thanks a lot !

system · April 17, 2018, 1:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Average per day of week with time_zone Elasticsearch	1	433	March 20, 2019
Script with input from one particular bucket, or from a higher level aggregation Elasticsearch	1	357	March 31, 2020
Average over buckets doc_count Elasticsearch	2	1552	July 6, 2017
How to aggregate based on another aggregation? Elasticsearch	6	1084	April 21, 2022
Calculate the moving standard deviation of count on a histogram aggregation Elasticsearch	1	928	July 5, 2017

Average per day of week aggregation

Related topics