Date_histogram over terms bucket

Hi, I'm trying to create a date histogram on the buckets formed based on aggregations. But, some how I'm not getting any data on the date histogram. Please see below to understand the scenario.

PUT view_log
{
"mappings": {
"vcc-analytics":{
"properties": {
"session_id": { "type": "keyword" },
"videoId": { "type": "text" },
"watchTime": { "type": "integer" },
"creationDate": {
"type": "date"
}
}
}
}
}

POST /view_log/_bulk
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s1", "creationDate":"2019-05-01T10:10:10", "watchTime":"0", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s1", "creationDate":"2019-05-01T10:10:40", "watchTime":"30", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s1", "creationDate":"2019-05-01T10:11:10", "watchTime":"60", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s1", "creationDate":"2019-05-01T10:11:40", "watchTime":"90", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s2", "creationDate":"2019-05-01T12:10:10", "watchTime":"0", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s2", "creationDate":"2019-05-01T12:10:40", "watchTime":"30", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s2", "creationDate":"2019-05-01T12:11:10", "watchTime":"60", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s2", "creationDate":"2019-05-01T12:11:40", "watchTime":"90", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s3", "creationDate":"2019-06-01T10:10:10", "watchTime":"0", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s3", "creationDate":"2019-06-01T10:10:40", "watchTime":"30", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s3", "creationDate":"2019-06-01T10:11:10", "watchTime":"60", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s3", "creationDate":"2019-06-01T10:11:40", "watchTime":"90", "videoId":"v-1"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s4", "creationDate":"2019-06-01T12:10:10", "watchTime":"0", "videoId":"v-2"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s4", "creationDate":"2019-06-01T12:10:40", "watchTime":"30", "videoId":"v-2"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s4", "creationDate":"2019-06-01T12:11:10", "watchTime":"60", "videoId":"v-2"}
{"index":{"_index":"view_log","_type":"vcc-analytics"}}
{"session_id":"s4", "creationDate":"2019-06-01T12:11:40", "watchTime":"90", "videoId":"v-2"}

GET view_log/_search
{

"size" : 0,
"aggs":{
"session" : {
"terms" : { "field" : "session_id"},
"aggs": {
"max_play_time": {
"max": {"field" : "watchTime"}
},
"min_creation_time":{
"min":{"field": "creationDate"}
}
}
},
"sessions_overr_time":{
"date_histogram": {
"field": "session>min_creation_time",
"interval": "day"
}
}

}

}

I'm expecting a response where for each give date I can see the number of sessions & their corresponding playtimes. Please help me with this.

Hi Madala,
This and other forms of large-scale behavioural analysis are best performed using an entity-centric index rather than an event-centric index of raw logs.

Thanks Mark! I'm new to elastic and I would like to clarify my understanding here. So, does that mean all the required querying or aggregations have to be made on entity-centric index which have to be built by pre-processing all the event-centric documents?

So, in my case I keep getting viewlog requests(events) that captures the video playback event data like played time until that time for every 30 secs. To create a entity centric document, I need to pre-process all the events for a session and create a single document with all the necessary attributes which can then be queried. Is my understanding correct here?

For certain operations, yes this is required on large-scale systems.
Some things like "most active user" can be determined relatively easily on an event-centric index but others like your example are much more taxing because there are too many distributed joins required.
While it may sound a pain to have to create an entity-centric index there are benefits which you may not have considered. For example, with each session holding a list of video IDs that were watched it would be possible to build a "people who watched X also watched Y" stylerecommendation system. I know of music streaming services that use elasticsearch in this way.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.