I am building a recommendation engine and collecting user events 1000 per sec on the website (around 10 different event types). The current plan to use ES as event store and logstash as event producer. Latest blogs and discussions around using ES for timeseries data looks really promissing. However I am facing a dilema how to organise my indexes / search since the users are from different time zones.
I have the current "events" and the historical events (up to 3 months in the past):
Both have the same format: user, sessionid, item, event_type, timestamp
I thought to index the "current events" in the daily logstash index, but then I can hit the problem with one session split in 2 indexes. These queries need to be ultra fast, therefore I wanted to have daily indexes.
Does it make sense to have a "current" alias that includes todays and yesterdays index?
For historical events, I need to aggregate per user, item and event type in the last 3 months.
I would need auto-aliasing and auto-purging indexes older than now - 3 months.
Does the whole idea make sense, anybody had something in the past?
Is there an example somewhere where this kind auto-aliasing and auto-purging is implemented?
Many thanks in advance,