Date Histogram buckets using top_hits of terms over last 30 days

Hello Elasticsearch,

I'm having an issue finding the right combination of aggregations to achieve a query I'm looking for. Essentially what I need to do is take the most recent document by ID over the last 30 days and take a value from that most recent document - depending on what the value of that document is I want to map its value to a 1 or a 0, which I can then run a sum aggregation on all IDs for each day.

I have data that looks something like this:

{ "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
{ "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
{ "id": 1, "date": "2020-01-06T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}

so I want to make buckets for each day in whatever time range - in this case: [2020-01-01, 2020-01-02, 2020-01-03, 2020-01-04, 2020-01-05, 2020-01-06, 2020-01-07]. For each day, I need the most recent document (according to "date" field) in the past 30 days, for each individual ID.

Day: 2020-01-01
    { "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
    { "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}

Day: 2020-01-02
    { "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
    { "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}

Day: 2020-01-03
    { "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
    { "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}

Day: 2020-01-04
    { "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
    { "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}

Day: 2020-01-05
    { "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
    { "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}

Day: 2020-01-06
    { "id": 1, "date": "2020-01-06T00:00:00Z", "status": 10}
    { "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}

Then for each document that matches in the bucket I need to do some predicate like status == 8 ? 1 : 0 which will reduce the documents into a value that I can then sum for each individual day.

Is there a way that I can do this? I've tried something things like using the Moving Function Aggregation to do that sort of "past 30 days most recent document" check, which works pretty well except that I get buckets of IDs > buckets of Days > 1 or 0 for each day, but then I can't figure out how to sum the values for each ID bucket together for each day (so that the resulting aggregation is a date_histogram, not an ID bucket containing a date histogram).

Please let me know if I can provide more info, thank you!

perhaps it could help to see the query as well. What I really want from this is to sum up each individual day for every id bucket, so the overall structure is still a date_histogram, but the resulting total is the sum of each respective day inside of an "epkeys" bucket.

in the database "id" is a uuid string

POST craftedindex/_search
{
  "query": {
	"bool": {
	  "must": {
		"query_string": {
		  "query": "*",
		  "analyze_wildcard": true
		}
	  },
	  "filter": {
		"range": {
		  "date": {
			"gte": "2020-01-10T00:00:00||-5d/d",
			"lt": "2020-01-10T00:00:00||+5d/d"
		  }
		}
	  }
	}
  },
  "aggs": {
	"epkeys": {
	  "terms": {
		"field": "id.keyword"
	  },
	  "aggs": {
		"status_dates": {
		  "date_histogram": {
			"field": "date",
			"interval": "1d",
			"min_doc_count": 0
		  },
		  "aggs": {
			"status_mapper": {
			  "max": {
				"field": "status",
				"missing": -1
			  }
			},
			"the_movfn": {
			  "moving_fn": {
				"buckets_path": "status_mapper",
				"window": 30,
				"gap_policy": "insert_zeros",
				"script": "if (values.length > 0) {for (int i = values.length - 1; i > -1; i--) { if (values[i] != -1) { return values[i] == 5 ? 1 : 0;} } } return 0;"
			  }
			}
		  }
		},
		"sumit": {
		  "sum_bucket": {
			"buckets_path": "status_dates>the_movfn"
		  }
		}
	  }
	}
  },
  "size": 0,
  "sort": {
	"date": "desc"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.