Using Filebeat with the rollover pattern


my team and I are looking for a solution to keep the size of indices below a certain threshold and we are looking at the Rollover API with interest. We are already using Curator for data retention purposes and thought we could add rollover actions there.

One problem we have, though, is with dynamic index names.

We are using Filebeat to ship logs directly to Elasticsearch from tens of different applications running on cloud infrastructure. Each application has its own index name, with monthly suffixes, which simplifies the data retention task.

The question is how do we tell Filebeat that, when creating a new index at the start of each month, it should really create an alias + an initial index that can be rolled over when the size condition matches?

The only solution we have found is to create aliases upfront and point Filebeat to such aliases, but this would need to be done every month and it's not a clean solution.

Is there something we are missing?
Also, how do we create a generic rollover action in Curator that matches all possible indices?

Thanks for any help

Question, why is an alias required on the write side? What is the read pattern? Have you considered using aliases for the read side? Understanding that will help in answering this question.

BTW, one tool that might help for cases like yours in the future is the upcoming Index Lifecycle Management (ILM) feature in ES, which you can track here will be the best solution for this problem.

Let me respond to this part of your question:

To truly simplify your data retention task, you should probably rethink the need for dynamically named indices with monthly suffixes. Rollover and Curator eliminate the need for adding dates to the index name.

From the official documentation:

PUT /logs-000001 
  "aliases": {
    "logs_write": {}

For each filebeat index you need to create, you would create one index, with its accompanying alias, and then rollover when the conditions are right, using Curator.

You do not need to create a new index every month when you can just use Rollover. Use creation_date to filter based on when the index was created. Alternatively, you could use field_stats to determine the min_value and/or max_value for the timestamp field in each index.

why is an alias required on the write side?

The write alias would allow Filebeat to be agnostic about rollover operations.

What is the read pattern?

we use index patterns that match all indices for a specific application

Thanks for you input

We thought about removing the monthly suffixes, but we'd still have dynamic index names coming from the application name. Would you suggest to get rid of those as well?

Using the name of the application in the index name gives us couple of benefits.

  1. Developers of different applications can focus just on their own application logs by selecting just the index pattern for that application, e.g. in kibana. We could still achieve this by using a dedicated field in the log documents and then create read aliases that filter documents based on that field, but we are not sure how much of a performance penalty that would be.

  2. Different applications might use the same field names but with different mappings, and keeping separated indices helps avoiding clashes.

Thanks for helping with this.

I think that yes, for your scenario removing the dynamic names is the only way, but I'll defer to @The untergeek on this one.

I see @Andrew_Cholakian1, ok let's wait to hear from @theuntergeek then

I am keen in understanding how we would should address the possible field mapping conflicts if we went for that solution.

Yes, remove dynamic index names to fully use Rollover. It doesn't mean you have to live with mapping conflicts, though. You can still create one rollover pattern per data type.

For example, if filebeat1 has the same data type as filebeat8, you could make a rollover alias called foo which points to foo-000001. And if filebeat2 has the same data type as filebeat5, you could make a rollover alias called bar which points to bar-000001.

Keep the data types consistent with each other, no more mapping conflicts. Just configure filebeat to point to the alias for that data type.

Thanks for your input @theuntergeek

We experimented with this solution but we'd end up having as many aliases as the number of applications.

Would it be an option to use ingest pipelines, perhaps with a scripting processor, to create an index alias on the fly? If this is possible, we would be able to retain our per-application index names.

It would actually be easier to flatten your data more and put more fields into an index than go that route, I think. Otherwise, yes, as many aliases as applications is still what I’d recommend.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.