Hi team,
Problem statement
I want to store events over time as new record in my storage system. imagine I want to monitor taxi location changes. I could keep data in a format [taxi_id, time_start, time_end, location_id]
Location can be some kind of geoFence or anything.
Type of Queries
Then I want to query various things within a time interval. e.g how much time a taxi driver spent in location_1 in interval [11:00 -12:00] today etc
The nature of my data is clearly for time-series storages. Since I add a new record every time an event is triggered.
The problem that I have here is that I may have a single record spanned within a bigger interval than the one given from my user (e.g UI application)
e.g record [Robert Denniro, 12/13 10:00 - 12/13 11:30, location_1]
If the query is how much time Robert spent in interval [12/13 10:00 - 12/13 11:00]
I can't aggregate with something fixed like SUM(EndTIme-StartTime). I need to check overlaps between each record in my DB and user's interval(s)
#2 questions:
- is Elasticsearch good for this kind of queries (data having interval (with startTime , EndTime) )?
- Does Elasticsearch provide some support out of the box for this kind of aggregations ?
Clearly there are 2 approaches for calculating such queries :=
- you do the calculation against raw data records. This case a script/method should be executed to find intersections between 2 dates etc (the script should be executed for each row)
- you cook the data in advance. Such way that when user provides an interval then all the records fall into this interval. This way I could use a
Select SUM(EndTime-StartTime) aggregation
I'd appreciate your feedback on this.
Thanks