[APM][JAVA] Solutions, best practices and APM test guidelines

Hi guys,
we are currently using APM JAVA agent to monitor different JAVA applications. We are collecting transactions and JVM metrics and using logstash as ETL in order to make data enrichment based on business logic. We have basically developed 3 different Kibana dashboards organized in:

  • transaction statistics (averages, response status...)
  • metrics
  • transaction trends

Moreover we are planning to implement custom transactions in order to correlate them with log events and improve system observability.

Questions:

  1. Do you have any suggestions about out solution according to your experience on different projects? Eg: we could correlate transactions statistics with metrics collected in the same time period.. something else? :slight_smile:
  2. Is there something else we could implement using Elastic APM or are we making the most of this technology? Are there some features we are missing or that will be implemented in the next future? NB: consider that we are going to use ML and alert features as soon as we have the subcription installed.
  3. According to your experience, are there some guidelines or best practices for a Performance Test in order to compare JVM performance with/without APM agent attached on it? Eg:
  • APM Agent attached: we can easily collect transactions response time and calculate their averages
  • APM Agent not attached: we could use entity centric indices and a py script in order to calculate response times based on log and custom transaction id
    Any suggestions on this point?

Thanks a lot,

Andrea

Hi Andrea,

Well, those are very wide-scoped questions :slight_smile: , but I'll try:

Only advice is to dig into the transaction/span/error documents and see what you have there. For example, since APM server 7.3, the client.ip field is populated by default, so you can utilize that for geolocation.
Note that since 7.2 we have dedicated JVM metrics views, and those will be greatly enhanced already in 7.5.

There is a lot else to be done :slight_smile: . Most basic- do you use other agents as well to leverage our distributed tracing capabilities? For example, did you try using the RUM agent as well?
Other than that, there is a lot of data collected, and it's up to you asking the questions and finding the relevant information to answer them. Just as an example- you can get some interesting insights by looking at the request user agent or specific headers (than may be mapped to users).

We are tirelessly working on new features and integrations, just keep following.

The best is to use independent tools to measure same scenarios/loads with and without agents. If you have load tests, use them. Compare system and JVM metrics (eg CPU, heap/non-heap usage etc.) and if you have some end-user tools, measure the overall latency overhead per request.
Also, you may find this blog post interesting.

I hope this helps.
Eyal.