Hi,
We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.
Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).
Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".
For example:
EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}
}
EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}
}
EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}
}
EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}
}
At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.
Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.
Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?
Thanks in advance,
Mark