my team and I are looking for a solution to keep the size of indices below a certain threshold and we are looking at the Rollover API with interest. We are already using Curator for data retention purposes and thought we could add rollover actions there.
One problem we have, though, is with dynamic index names.
We are using Filebeat to ship logs directly to Elasticsearch from tens of different applications running on cloud infrastructure. Each application has its own index name, with monthly suffixes, which simplifies the data retention task.
The question is how do we tell Filebeat that, when creating a new index at the start of each month, it should really create an alias + an initial index that can be rolled over when the size condition matches?
The only solution we have found is to create aliases upfront and point Filebeat to such aliases, but this would need to be done every month and it's not a clean solution.
Is there something we are missing?
Also, how do we create a generic rollover action in Curator that matches all possible indices?
To truly simplify your data retention task, you should probably rethink the need for dynamically named indices with monthly suffixes. Rollover and Curator eliminate the need for adding dates to the index name.
For each filebeat index you need to create, you would create one index, with its accompanying alias, and then rollover when the conditions are right, using Curator.
You do not need to create a new index every month when you can just use Rollover. Use creation_date to filter based on when the index was created. Alternatively, you could use field_stats to determine the min_value and/or max_value for the timestamp field in each index.
We thought about removing the monthly suffixes, but we'd still have dynamic index names coming from the application name. Would you suggest to get rid of those as well?
Using the name of the application in the index name gives us couple of benefits.
Developers of different applications can focus just on their own application logs by selecting just the index pattern for that application, e.g. in kibana. We could still achieve this by using a dedicated field in the log documents and then create read aliases that filter documents based on that field, but we are not sure how much of a performance penalty that would be.
Different applications might use the same field names but with different mappings, and keeping separated indices helps avoiding clashes.
Yes, remove dynamic index names to fully use Rollover. It doesn't mean you have to live with mapping conflicts, though. You can still create one rollover pattern per data type.
For example, if filebeat1 has the same data type as filebeat8, you could make a rollover alias called foo which points to foo-000001. And if filebeat2 has the same data type as filebeat5, you could make a rollover alias called bar which points to bar-000001.
Keep the data types consistent with each other, no more mapping conflicts. Just configure filebeat to point to the alias for that data type.