We are using ES as a data store for events from devices. By the end of the year I expect to have a few 100 million events being written every day.
My plan is to create two aliases which will get used by clients:
An alias ("events-current") that points to the current day's index
Another ("events-all") that contains all of the event indices.
To do this I am planning to create a script that will:
Export the mappings from the index behind events-current
Create a daily index "events-YYYY.MM.DD"
Apply the mappings from the previous day's index to the new index
Moving removing the previous day's index from the "events-current" alias
Adding the new index to both the "events-current and "events-all" aliases
I can do this with shell scripts but there has to be a better way. I'm pretty sure I'm doing the same thing that logstash does by default but wanted to know if I'm missing anything or anyone had a suggestion of a better way to set this up.
Sorry for the late response, was out of town last week. The mappings may change from day to day, when a device registers it is able to define a semi-arbitrary set of events that it supports, if I used templates I'd have to update the template every time I added a new type of event.
Templates do support pattern matching to some degree but I see your point.
I should mention that you should pay attention to the total number of fields in your indexes. Each one has overhead and letting things just create whatever they want might destabilize things. Its fine if devices don't create more than they need but you'll be vulnerable to mistakes like uuids as field names.
Thanks for the feedback!
I do think I could do something like
Export the existing index's mapping as a template
Create ther new index & make the alias changes
and it would at least save one step.
We are already using templates to explicitly set some field types we're we've had some confusion of the type of data (boolean/string etc).
There should not be that many event types, maybe 100 or so all together. Longer term we're going to split the indexes by device type as well. My main concern is i don't want to be manually updating the templates every time we add a device or new functionality to an existing device.
Yeah, manual would be a pain. When I maintained some production indexes I used "dynamic": false and added the new properties with a script. I just didn't want anything sneaking in properties I didn't know about. But I added properties much less frequently than it sounds like you will.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.