I would like to use Elasticsearch to index incoming "events" data. Events are described by a number of metadata fields (like event type, category, service type, service name, etc.). Apart from the metadata fields there is more descriptive information that is stored elsewhere. In ES I would like to only store the metadata and make it available for search (to filter and find events based on the above metadata fields).
Now, the events data arrives with the same metadata every minute. I want to have a retention time of 1year for searching over events. I was thinking of having 30d indices (since rollover API does not support month for max_age) and then deleting after the indices have reached 1year (12 count). However, this means there is lot of duplication for the data every 30d. 90% of the events are the same m-o-m.
Is there a better index design for such data? Is there a way to only index delta new events in new indices and query old indices for existing old events data?