Ranged and recurring events


(mjarecki) #1

Hi,

We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}
}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}
}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}
}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}
}

At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(ppearcy) #2

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}
}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}
}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}
}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}
}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}
}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}

}

At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(ppearcy) #3

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}

}

At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(mjarecki) #4

Thanks Paul.

On 14/11/2010, at 8:15 PM, Paul wrote:

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}

}

At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(Shay Banon) #5

I agree with Paul, it sounds like a better solution and simpler to manage.
Note that date type supports up to millisecond resolution, not second level
resolution (it is stored internally as milliseconds since the epoch).

On Sun, Nov 14, 2010 at 11:30 AM, Mark Jarecki mjarecki@bigpond.net.auwrote:

Thanks Paul.

On 14/11/2010, at 8:15 PM, Paul wrote:

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out
data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be
able to query the events index to find, for example, what events are
occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I
want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes
(e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start",
"finish" if not recurring) is initially stored as a document in Riak. Using
a post-commit-hook, the event is added to the events index in ES. A field
called "time_periods" is included that contains a series of 15 minute time
period instances derived from the "rrule"/"repeat_until" or "start"/"finish"
attributes. A time period "2010-11-11T14:00" is a block of time that ranges
from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00",
"2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]

            ….
    }

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event
=>

{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00",
"2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45",
"2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30",
"2010-11-11T14:45"….]

            ….
    }

}

At scheduled intervals, a recurring event's time periods are updated -
old ones removed, new ones added. The time periods are a floating set of
values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day
everyday), and others recur daily at multiple times during the same day.
This leads to a large number of time periods for some events - 5952 time
periods in 2 months (24/7 15 minute periods). The database will have about
10000 such frequently recurring events. We worry that our proposed approach
might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with
thousands of values? Does anyone have suggestions on how better to
efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(mjarecki) #6

Just to clarify, would you duplicate the entire event document in ES (including other searchable and non-searchable fields)?
Or would you only index the searchable fields, and then query the index to generate a list of ids and do a key lookup in Riak to fetch the full event documents?

Thanks again,

Mark

On 15/11/2010, at 12:58 AM, Shay Banon wrote:

I agree with Paul, it sounds like a better solution and simpler to manage. Note that date type supports up to millisecond resolution, not second level resolution (it is stored internally as milliseconds since the epoch).

On Sun, Nov 14, 2010 at 11:30 AM, Mark Jarecki mjarecki@bigpond.net.au wrote:
Thanks Paul.

On 14/11/2010, at 8:15 PM, Paul wrote:

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be able to query the events index to find, for example, what events are occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes (e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start", "finish" if not recurring) is initially stored as a document in Riak. Using a post-commit-hook, the event is added to the events index in ES. A field called "time_periods" is included that contains a series of 15 minute time period instances derived from the "rrule"/"repeat_until" or "start"/"finish" attributes. A time period "2010-11-11T14:00" is a block of time that ranges from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]
….
}

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00", "2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45", "2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"….]
….
}

}

At scheduled intervals, a recurring event's time periods are updated - old ones removed, new ones added. The time periods are a floating set of values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day everyday), and others recur daily at multiple times during the same day. This leads to a large number of time periods for some events - 5952 time periods in 2 months (24/7 15 minute periods). The database will have about 10000 such frequently recurring events. We worry that our proposed approach might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with thousands of values? Does anyone have suggestions on how better to efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(Shay Banon) #7

Its really up to you. I, most times, prefer to index more data, so it can be
fetched as part of the search request without needing to go to another
system. Going to another system (riak, db, ...) will mean higher latency and
more load in general when searching.

-shay.banon

On Mon, Nov 15, 2010 at 3:09 AM, Mark Jarecki mjarecki@bigpond.net.auwrote:

Just to clarify, would you duplicate the entire event document in ES
(including other searchable and non-searchable fields)?
Or would you only index the searchable fields, and then query the index to
generate a list of ids and do a key lookup in Riak to fetch the full event
documents?

Thanks again,

Mark

On 15/11/2010, at 12:58 AM, Shay Banon wrote:

I agree with Paul, it sounds like a better solution and simpler to manage.
Note that date type supports up to millisecond resolution, not second level
resolution (it is stored internally as milliseconds since the epoch).

On Sun, Nov 14, 2010 at 11:30 AM, Mark Jarecki mjarecki@bigpond.net.auwrote:

Thanks Paul.

On 14/11/2010, at 8:15 PM, Paul wrote:

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out
data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be
able to query the events index to find, for example, what events are
occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I
want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes
(e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start",
"finish" if not recurring) is initially stored as a document in Riak. Using
a post-commit-hook, the event is added to the events index in ES. A field
called "time_periods" is included that contains a series of 15 minute time
period instances derived from the "rrule"/"repeat_until" or "start"/"finish"
attributes. A time period "2010-11-11T14:00" is a block of time that ranges
from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00",
"2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]

            ….
    }

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event
=>

{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00",
"2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45",
"2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30",
"2010-11-11T14:45"….]

            ….
    }

}

At scheduled intervals, a recurring event's time periods are updated -
old ones removed, new ones added. The time periods are a floating set of
values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day
everyday), and others recur daily at multiple times during the same day.
This leads to a large number of time periods for some events - 5952 time
periods in 2 months (24/7 15 minute periods). The database will have about
10000 such frequently recurring events. We worry that our proposed approach
might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with
thousands of values? Does anyone have suggestions on how better to
efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(ppearcy) #8

You also want to make sure you understand your long term search
requirements, as well.

For example, if you want to search for an event happening at a certain
time combined with some search input, you'd need all the data pushed
down to each doc.

There are various trade-offs in terms of complexity and performance. I
prefer to keep things as simple as possible until the performance
suffers. For your case, this would mean taking figuring out how much
content you are going to be searching and making trade-offs based on
that.

Kimchy, thanks for the clarification regarding the date fields
milliseconds granularity. Good to know.

Best Regards,
Paul

On Nov 15, 2:38 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Its really up to you. I, most times, prefer to index more data, so it can be
fetched as part of the search request without needing to go to another
system. Going to another system (riak, db, ...) will mean higher latency and
more load in general when searching.

-shay.banon

On Mon, Nov 15, 2010 at 3:09 AM, Mark Jarecki mjare...@bigpond.net.auwrote:

Just to clarify, would you duplicate the entire event document in ES
(including other searchable and non-searchable fields)?
Or would you only index the searchable fields, and then query the index to
generate a list of ids and do a key lookup in Riak to fetch the full event
documents?

Thanks again,

Mark

On 15/11/2010, at 12:58 AM, Shay Banon wrote:

I agree with Paul, it sounds like a better solution and simpler to manage.
Note that date type supports up to millisecond resolution, not second level
resolution (it is stored internally as milliseconds since the epoch).

On Sun, Nov 14, 2010 at 11:30 AM, Mark Jarecki mjare...@bigpond.net.auwrote:

Thanks Paul.

On 14/11/2010, at 8:15 PM, Paul wrote:

Also, the approach I mention should give your events a 1 second
granularity if you use the date field type.

On Nov 14, 2:12 am, Paul ppea...@gmail.com wrote:

Hey Mark,
I think you may be better off use a date range per doc approach
where each event has multiple ranges and each of those ranges is a
document. So, using your example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"eventid": "1",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

In this case the range query is easy:
start:[* TO ] AND finish:[ TO *]

EVENT 2 A recurring event =>
{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 Sent to ES =>
{
"event": {
"eventid": "2",
"start": "2010-11-10T14:00",
"finish": "2010-11-10T14:30"
}

}

{
"event": {
"eventid": "2",
"start": "2010-11-11T14:00",
"finish": "2010-11-11T14:30"
}

}

and repeat for each occurrence. The same range query as above should
work to find the events.

This may feel like a slightly SQL-esque approach, but that's because
it's a SQL-esque problem :slight_smile:

There may be better solutions, as I am not familiar with all of the ES
search options.

Hope this helps,
Paul

On Nov 13, 11:47 pm, Mark Jarecki mjare...@bigpond.net.au wrote:

Hi,

We are looking at using ES in an upcoming project and are fleshing out
data models for it - and are hoping for some suggestions on our approach.

Our database consists of ranged and recurring events. We want to be
able to query the events index to find, for example, what events are
occurring in the current 15 minute time period (e.g. if it's 3.43pm now, I
want all events active between 3.45pm - 3.59pm).

Our proposed approach is such: each event with its various attributes
(e.g. "rrule", "repeat_start", "repeat_until" if recurring; "start",
"finish" if not recurring) is initially stored as a document in Riak. Using
a post-commit-hook, the event is added to the events index in ES. A field
called "time_periods" is included that contains a series of 15 minute time
period instances derived from the "rrule"/"repeat_until" or "start"/"finish"
attributes. A time period "2010-11-11T14:00" is a block of time that ranges
from "2010-11-11T14:00" until "2010-11-11T14:14".

For example:

EVENT 1 (truncated). A ranged event =>
{
"event": {
"start": "2010-11-11T14:00",
"finish": "2010-11-11T15:00"
….
}

}

EVENT 1 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-11T14:00",
"2010-11-11T14:15", "2010-11-11T14:30", "2010-11-11T14:45"]

            ….
    }

}

EVENT 2 (truncated and with a dodgy example rrule). A recurring event
=>

{
"event": {
"rrule": "daily 14:00 until 14:30",
"repeat_start": "2010-11-10",
"repeat_until": "2010-12-10"
….
}

}

EVENT 2 (truncated). Sent to ES =>
{
"event": {
"time_periods": ["2010-11-10T14:00",
"2010-11-10T14:15", "2010-11-10T14:30", "2010-11-10T14:45",
"2010-11-11T14:00", "2010-11-11T14:15", "2010-11-11T14:30",
"2010-11-11T14:45"….]

            ….
    }

}

At scheduled intervals, a recurring event's time periods are updated -
old ones removed, new ones added. The time periods are a floating set of
values that might typically span 2 months.

Our concern is that some recurring events are continuous (24 hours/day
everyday), and others recur daily at multiple times during the same day.
This leads to a large number of time periods for some events - 5952 time
periods in 2 months (24/7 15 minute periods). The database will have about
10000 such frequently recurring events. We worry that our proposed approach
might bring ES to a crawl - especially during indexing.

Is it reasonable to use ES in this way? Can ES handle a field with
thousands of values? Does anyone have suggestions on how better to
efficiently index and query ranged/recurring events?

Thanks in advance,

Mark


(system) #9