Hi,
I'm new to elasticsearch and running version 9.0.2 deployed with the elastic operator 3.0.0 to my kubernetes cluster. I ingest access logs from multiple web proxies into elasticsearch. These logs contain the IP of the user that connected to my site. Now I wish to keep these logs in their original form for a certain amount of time and then drop the IP and keep these "anonymized" logs. I am struggling with this a little bit and it feels like I am missing some fundamentals. Here is the approach I came up with so far:
- Logs are initially ingested (from fluent-bit) to indices in the format apache-access-<server>-<date>. So for example
apache-access-my-server-2025.06.17
- I have an index template that applies an ILM policy to these indices. The policy is very simple and only defines the hot phase rollover conditions and nothing else.
...
"index": {
"lifecycle": {
"name": "apache-access-logs",
"rollover_alias": "keep-long-term-apache-access"
}
...
- I defined an ingest pipeline that drops the field that contains the users IP. Testing the pipeline with a document from my existing indices works fine
[
{
"remove": {
"field": "host"
}
}
]
- I have an index template that matches the rollover_alias from step 2 and sets default pipeline as the pipeline from step 3.
Now this currently fails with
"index.lifecycle.rollover_alias [keep-long-term-apache-access] does not point to index [apache-access-server-2025.06.17]"
but beyond the fix for this specific issue this makes me question if my approach is even correct.