Filter for specific field values and create another index to store for longterm storage

I have filebeat sending logs from multiple files into an elasticseash index. What is the best way for me to filter on those logs and only store those filtered logs in an index for longer time (longer ILM policy)?

k8s pod logs → goes to a file → filebeat pods read the file and send specific annotated logs to an index. Now, I want to filter for specific logs within elasticsearch and store only those for longer timeframe. What is the best way to do so?

I’ve looked into rollup and transform jobs within Kibana, but they seem to be more for aggregation than simply to store the same documents and being able to filter further to get to finding what happened to a problem. They don’t delete/remove the documents from the source, so we would have duplicated documents right?

Thank you so much, in advance!

Hello @C_Shah

Welcome to the Community!!

Yes we will have duplicated data for short time :

Filebeat → logs-raw (ILM 7 days)
                ↓
      Transform (continuous)
                ↓
     logs-long-retention (ILM 180 days)

We can also try to filter data at source end i.e. while indexing the raw data by which we will ignore the records during indexing for the first time & no duplicate data will be stored. For this it will be required to the review the usecase.

Thanks!!

You don’t need Transform for this scenario. Transform is more suited for aggregation/pivot use cases and would introduce unnecessary duplication and extra indexing overhead.
The cleaner approach is to make the retention decision at ingest time. Route documents to different indices (or data streams) based on a condition (e.g., specific field values), and attach different ILM policies to each destination.

In other words:
Send all logs to a short-retention index by default.
During ingest, evaluate your filter condition.
If it matches, route that document to a long-retention index instead.
This avoids duplication, keeps the architecture simple, and scales much better than copying documents afterward.
In general, it’s best to separate retention policies through ingest-time routing rather than post-processing.

1 Like