Hi everyone;
I've been trying to setup ILM with Rollover for APM Indices and I'm still not able to achieve what I want and I don't even know if it is at all possible on Elastic Cloud.
The context:
Our deployment is used for APM only, it is deployed on a HOT-WARM architecture as this seemed the most adequate architecture for our usage.
We had setup ILM policy to move indices from HOT to WARM after 3 days, reducing the number of replicas at the same time and we then delete indices after 14 days.
14 days retention on our APM is good enough for now and we will look at optimising this later down the road, that's not the focus for now.
APM is configured out of the box to split indices by day.
On a normal day, our span index can contain nearly 20M documents which translate to a size upward of 30GB.
After discussing with the support on some performance issues that we were encountering from time to time with our deployment, we ended up looking for the ability to rollover our APM indices to split them in smaller ones.
Lucky for us, the 7.2 version came out recently and provide ILM with rollover out of the box and there's even a documentation page dedicated to ILM + Rollover + APM: https://www.elastic.co/guide/en/apm/server/7.2/manual-ilm-setup.html
Unfortunately though, this page contains some instructions that cannot be performed on Elastic Cloud (point 7).
Support pointed me to https://www.elastic.co/guide/en/cloud/current/ec-configure-index-management.html but that barelly mention rollover.
The goal:
- ILM with rollover for APM-* indices, especially: span & transaction, but if it could be applied to error & metric as well that would make things easier.
- Phases definitions
- HOT
- Rollover conditions
- size > 10GB
- doc # > 5M
- age > 1d
- Priority 100
- Rollover conditions
- WARM
- 3 days from rollover
- 1 replica (instead of 2 that are set by default via the index template)
- Change allocation requirement to WARM nodes
- Priority 50
- DELETE
- 14 days from rollover
- HOT
This is what I've done so far:
Define an APM Policy with Rollover
GET /_ilm/policy/apm-policy-with-rollover
{
"apm-policy-with-rollover": {
"version": 1,
"modified_date": "2019-07-23T13:31:52.635Z",
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "10gb",
"max_age": "1d",
"max_docs": 5000000
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "14d",
"actions": {
"delete": {}
}
},
"warm": {
"min_age": "3d",
"actions": {
"allocate": {
"number_of_replicas": 1,
"include": {},
"exclude": {},
"require": {
"data": "warm"
}
},
"set_priority": {
"priority": 50
}
}
}
}
}
}
}
I have created aliases for the APM Indices:
GET /_alias/apm-*
{
"apm-7.2.0-error-000001": {
"aliases": {
"apm-7.2.0-error": {
"is_write_index": true
}
}
},
"apm-7.2.0-error-2019.07.24": {
"aliases": {
"apm-7.2.0-error": {}
}
},
"apm-7.2.0-metric-2019.07.24": {
"aliases": {
"apm-7.2.0-metric": {}
}
},
"apm-7.2.0-metric-000001": {
"aliases": {
"apm-7.2.0-metric": {
"is_write_index": true
}
}
},
"apm-7.2.0-span-000001": {
"aliases": {
"apm-7.2.0-span": {
"is_write_index": true
}
}
},
"apm-7.2.0-span-2019.07.24": {
"aliases": {
"apm-7.2.0-span": {}
}
},
"apm-7.2.0-transaction-2019.07.24": {
"aliases": {
"apm-7.2.0-transaction": {}
}
},
"apm-7.2.0-transaction-000001": {
"aliases": {
"apm-7.2.0-transaction": {
"is_write_index": true
}
}
}
}
edited for brevity I have many more "day" indices, not just today's
And I've setup new Index template to setup the future indices with lifecycle, rollover alias & index alias
GET /_template/apm-7.2.0-*
{
"apm-7.2.0-span": {
"order": 2,
"index_patterns": [
"apm-7.2.0-span-*.*.*"
],
"settings": {
"index": {
"lifecycle": {
"name": "apm-policy-with-rollover",
"rollover_alias": "apm-7.2.0-span"
}
}
},
"mappings": {},
"aliases": {
"apm-7.2.0-span": {}
}
},
"apm-7.2.0-transaction": {
"order": 2,
"index_patterns": [
"apm-7.2.0-transaction-*.*.*"
],
"settings": {
"index": {
"lifecycle": {
"name": "apm-policy-with-rollover",
"rollover_alias": "apm-7.2.0-transaction"
}
}
},
"mappings": {},
"aliases": {
"apm-7.2.0-transaction": {}
}
},
"apm-7.2.0-metric": {
"order": 2,
"index_patterns": [
"apm-7.2.0-metric-*.*.*"
],
"settings": {
"index": {
"lifecycle": {
"name": "apm-policy-with-rollover",
"rollover_alias": "apm-7.2.0-metric"
}
}
},
"mappings": {},
"aliases": {
"apm-7.2.0-metric": {}
}
},
"apm-7.2.0-error": {
"order": 2,
"index_patterns": [
"apm-7.2.0-error-*.*.*"
],
"settings": {
"index": {
"lifecycle": {
"name": "apm-policy-with-rollover",
"rollover_alias": "apm-7.2.0-error"
}
}
},
"mappings": {},
"aliases": {
"apm-7.2.0-error": {}
}
}
}
And on today's indices I have
GET /apm-7.2.0-span-2019.07.24
{
"apm-7.2.0-span-2019.07.24": {
"aliases": {
"apm-7.2.0-span": {}
},
"mappings": {...},
"settings": {
"index": {
"mapping": {...},
"auto_expand_replicas": "false",
"provided_name": "apm-7.2.0-span-2019.07.24",
"query": {...},
"creation_date": "1563926406411",
"priority": "100",
"number_of_replicas": "2",
...
"lifecycle": {
"name": "apm-policy-with-rollover",
"rollover_alias": "apm-7.2.0-span"
},
"codec": "best_compression",
"routing": {
"allocation": {
"require": {
"data": "hot"
}
}
},
"number_of_shards": "1"
}
}
}
}
Again edited for brevity.
Can someone help me find what piece I'm missing to make this whole thing work?
The piece I've added today are the apm-7.2.0-*-000001
indices and setting them as write_index, I hope this was the only thing missing and that this will magically start working tonight...
But if you have other ideas I'm all ears.