we are using an Elastic deployment for collecting apm & rum data of a website. It is used for a project where we are improving the user experience. The project will take a few weeks/months so we want to rollup the data so we can store it for a longer period of time. In the observablity screen we always use the 75th percentile for checking performance, but a rollup doesn't support the 75th percentile aggregation. How can we rollup this data and still be able to see the 75th percentile of the measurements? It won't be a problem if it has to be shown in a custom dashboard other than the user experience app.
For APM, we don't use Elasticsearch's rollup. Instead, APM Server do its own pre-aggregation in memory and store it in a designated datastream every 1 minute. Then when you load User Experience app, it queries ES aggregations on that datastream and displays the metrics.
The easiest way I would recommend is to store the metrics for a longer time by configuring ILM policy. You will be able to use the existing User Experience UI and see data for a longer period of time but at the cost of using more storage. The storage, however, can be optimized after version 8.7.0, as we are planning to do rollups in memory for different intervals, such as 10m and 60m. Configuring ILM Policy for the metrics generated with 60m intervals to have a longer delete definition will store fewer documents. Also Data tiers | Elasticsearch Guide [8.6] | Elastic can be useful.
Thanks for your reply!
We do have the ILM policies set, but the thing is that we are going to run out of space if we set it for too long, we want to store the data much more condense for up to a year. What I forgot to mention is that the RUM data is the data we want to store for a long period, the apm data doesn't have to. Will those rollups for a longer interval that come after 8.7.0 also work for RUM data? And what aggregation will be used to roll up that data?
No worries, you guessed it right the first time! Indeed, the page-load is the most important for us. We want to see that metric change over time as we are doing improvements to the page those metrics belong to.
If you are only interested in page-load duration metrics, you can just keep metrics-apm.internal* datastream for a longer period as it has pre-aggregated page-load duration histogram. After 8.7 though, we are making changes to data stream name so it will become metrics-apm.transaction.<interval>-* . This is also due to supporting longer interval metrics, and it means you will need to update ILM policy again.
Unfortunately, if you are also interested in RUM data other than page-load, I don't think we have an answer for rolling up the RUM-specific metrics at the moment. So sorry for not being helpful here!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.