How to use date_histogram to aggregate documents that span multiple period based on two date fields

Hi,

So I have documents defined based on the following mapping:

documents:
        "ID": {
          "type": "integer"
        },
        "Start": {
          "type": "date",
          "format": "strict_date_optional_time"
        },
        "End": {
          "type": "date",
          "format": "strict_date_optional_time"
        },
        "User": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },

And here is a sample representing the data I might have:
(Small sample to explain but please bare in mind that in reality I'll be handling millions of documents!)

ID: 1, Start: 2020-06-08T12:00:00.000Z, End: 2020-06-09T12:00:00.000Z, User: MyUser1
ID: 2, Start: 2020-06-12T13:00:00.000Z, End: 2020-06-12T14:00:00.000Z, User: MyUser2
ID: 3, Start: 2020-06-16T12:00:00.000Z, End: 2020-06-23T12:00:00.000Z, User: MyUser2
ID: 4, Start: 2020-06-17T12:00:00.000Z, End: 2020-06-17T13:00:00.000Z, User: MyUser1
ID: 5, Start: 2020-06-17T12:00:00.000Z, End: 2020-06-22T12:00:00.000Z, User: MyUser3
ID: 6, Start: 2020-06-18T12:00:00.000Z, End: 2020-06-19T12:00:00.000Z, User: MyUser2

So, what I'd like to be able to do is to compute the number of distinct "User" that "exists" during each period of time for multiple resolution (day by day, week by week etc.).
By "exists" I mean that the "Start" to "End" range of a document is part of or overlap with a period of selected resolution.

In the end what I should have as a result based on previous sample would be something like this:

Period              nb_disctinct_users

1d
------------------------------------------
Day June 8          1
Day June 9          1
Day June 12         1
Day June 16         1
Day June 17         3
Day June 18         2
Day June 19         2
    ...             2
Day June 22         2
Day June 23         1
Day June 24         0

1w
-----------------------------------------
Week 8-14           2
Week 15-21          3
Week 22-28          2

The thing is, I already know how to do this sort of request to compute/agg on a specific column date with something like this:

{
    "aggs" : {
        "date" : {
            "date_histogram" : {
                "field" : "Start",
                "interval" : "1d"
            },
	      "aggs": {
		"user_count": { "cardinality": { "field": "User.keyword" } }
        }
    }}
}

But I can't figure out how to take into account "ranges" with dates that overlaps periods of the selected resolution.

Any idea on how to tackle this problem ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.