Hi, I'm new to the elasticsearch scene and have quite a lot of questions.
First of all, my use case at the moment is pretty simple: when a user performs a "registration" action on my site, I want to add a document logging this occurrence so that I can track and visualize in Kibana when and how many users are registering.
After doing a lot of reading, I have the mile-high view of wanting time-based indices. A daily or weekly index seems appropriate, so that way on slow days it'll be OK, and on days where potentially many users register it won't blow up the index.
A common approach seems to be to set up an index template and just let the index be created when you insert (i.e. logs_* template where the * is today's date), but another one is to create the indices yourself (I've come to assume) and set up an alias for "today's index" and another index for the past 3 month for searching.
Unfortunately, I'm limited to AWS ES, so I can't use the fancy new rollover api it seems. Which was a bummer since the blog has a nice post about it.
So, in summary my main questions are:
For my use case, would it be better to rely on auto-generated indices, or try to have more control with aliases?
These documents won't be modified after insert, so are there any optimizations I can take?
How can I manage old indices? Since once the day/week ends, I no longer need to devote any resources to writing to the index anymore, just reading for the occasional search.
If I do go with aliases, since I don't have the rollover api, what's the best way to make sure my aliases are kept up to date and my old indices are removed from that "last 3 months" window?
I've decided auto-generated indices will probably be the easiest for me; I'm going to specify the index as logs_YYYY-MM-DD whenever adding a doc so it'll just be auto-generated for the new days.
This allows me to basically ditch the alias work, like you suggested, I may still end up using some later on when I have additional use-cases that require a lot more documents though, since it would be nice to limit the number of shards being accessed through an alias instead of having to run a query on the entire available/open set of indices and have time be a filter.
I've also been checking out curator, I plan on following this blog post for setting curator up with AWS ES after I've gotten it all working on my local machine.
oh, I did check out the compatibility matrix and am using Curator 3.5, planning on using it on our AWS ES 2.3, I know it says I won't be able to take snapshots, but are you saying it still wouldn't work?
in your opinion Aaron, am I going about this wrong? My main goal here to be able to store one-off events, that won't change, from my site and visualize how many are happening and when they are happening with Kibana; with how much warning there is of needing to scale, prevent data loss, and how there can be crippling performance issues if everything isn't perfect from the get go, perhaps I've overcomplicated things by trying to set up all these time-based indices, aliases, and using Curator "optimize" and close indices over time and the such? Perhaps you could suggest some common metric storing architectures or stacks using ES I can look into?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.