Need explanations with filebeats & ILM

Hi,

I’m trying to understand few things before setting up my cluster.

1 - Why filebeat default output index is filebeat-%{[agent.version]}-%{+yyyy.MM.dd} now it’s recommanded to use the ILM ? Why letting filebeat creating one indice per day while it’s supposed to be now controled by ILM ? So why not just filebeat-%{[agent.version]}

2 - From what I read, this a best pratice to not mix the prod and non-prod logs. I have 2 Docker clusters with filebeats on both ; 1 cluster ES (3 nodes). I wanted to add the cluster name into the indice name but like other I had the issue with ILM : Cannon simple change index name in filebeat
My goal is to keep the production logs for 1 year and 3 months for the rest of the environments.
Right now I don't see any other solution that having 2 different clusters ES (which could be a good idea I know).

Thanks for your advices
Regards

Hello,

Any help is greatly appreciated.

Thank you.

Hi @smux and welcome to discuss! :slight_smile:

Let me try to answer your questions.

Having the version in the index name is good because this allows to change and improve field mappings between versions.
Having indexes created by some specific time is important to allow to easily archive or remove old data. If you have all your data in the same index, to remove old data you need to reindex, filtering out this data, and this is tricky if you are also writing to the same index at the same time. But if you have indexes created by date you can simply remove the older indexes, while continue writing to the newest one.

Also having single indexes with a lot of data can hit performance, so it is better to split them somehow in any case, and in the case of logs and metrics splitting them by time makes sense, because newer and older data have different needings and expectations.

Under the hoods all this is managed by ILM, and filebeat actually write to an alias called just filebeat-%{[agent.version]}, as you could expect without considering the low level details. So with ILM you have the best of both worlds, simple configuration in the beats side, and automatic index management on ES. Defaults are thought to work well in most common cases.

Yes, it is usually good to not mix prod and non-prod logs, because you probably have different expectations for them as the different retention periods you mention.
There are though some options to do it in a single cluster, you can send the logs to different indexes.

If you are deploying your production load in different machines you can provide different configurations for the filebeats in prod and non-prod machines, for example setting these two options in the non-prod machines would setup a different policy and send the logs to a different index, prefixed by filebeat-dev:

setup.ilm.rollover_alias: 'filebeat-dev-%{[agent.version]}'
setup.ilm.policy_name: 'filebeat-dev-%{[agent.version]}'

You could then customize the ILM policy using Kibana or the ILM API.

Index pattern in Kibana would work with both indexes, because it is configured by default as filebeat-*. You would need to add some field in filebeat to be able to distinguish both loads in dashboards. For that you can use the add_fields processor.

If you deploy production and non-production load in the same machine you could have two filebeat instances running in the same machine, each one of them handling one kind of logs, and each one of them with a different ILM configuration.

There are some more options that you could explore, for example you can disable ILM (with setup.ilm.enabled: false) and manage index lifecycle on your own. Also the Elasticsearch output allows to select what index to use depending on the load of the event, using the indices option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.