I created a SaaS application and I'm using APM to track metrics (both server side Java and client side Angular).
APM is an incredible tool and it tracks very well. I track transactions and spans, but spans are useful for a small period of time to do more fine grain checks.
What I'd like to do is collecting general informations about the system to calculate how each of my customer impact on the system. For example I want to track:
how many DB (Msql, Redis, etc) operations are made
how many http operations are made
the average time of each transaction
and details like that. I want to agggregate all these data into a per day metric.
In this way I can set a IML for the APM index, removing old data after a certain period of time, and have a new index of aggregate data that I will keep for long time to have a general overview of the system.
Do you have some suggestion how I can accomplish this with ES stack? (I'm on ES cloud) Do I have to implement something on my end or I can do this kind of aggregation with script directly in ES?
With that said, what you want to do is perfectly doable with Elastic APM — thanks to its foundation based on the Elastic Stack. You have a couple of choices:
Design your aggregatable metric index as a data stream
Let me highlight these options so you can better choose.
Design your aggregatable metric index as a data stream
Data streams behave just like plain indices, except they are the append-only indexes that can span multiple physical indexes behind the scenes. Conceptually, you will append data into something virtual. How many indexes will be created will depend on how you want to partition them — something you can implement with an ILM. Using data streams, you can accumulate your metrics as temporal-based data and run your aggregate queries to fetch data any time you want. How large the data stream may become is manageable: you will break down the indexes into smaller ones already using the ILM rules, and historical data can be sent to a cheaper storage system such as object storage using searchable snapshots.
@aravindputrevu from the Elastic community team talked about how data streams work here:
Compute an entity-based index using a transform
Instead of having an ever-growing index where you will be running point-in-time-queries to retrieve your daily aggregates, you can build an entity-centric index that will always present the metrics per day given a summarization process that can happen automatically and behind the scenes called transforms. Transforms can be executed either in batch or in a continuous mode. By continuous, I mean that as new data arrive into the source index (that can very well be a data stream), the aggregate is re-calculated to snapshot the latest results into the output index — which is going to sustain your dashboard.
The tutorial below may provide you a good understanding of how transforms works:
I think it makes perfect sense to combine the two approaches, but the best thing about the Elastic Stack is that you have options to choose from. I wrote last year a demo that uses these two concepts to build a real-time scoreboard for the Pac-Man game:
Let me know if you need more help with this. I'm glad to help you build this
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.