Network View Visualisation

Not sure if this should be in the Kibana section, but since this question is about APM, I guess it makes sense to put it in here.

In our previous APM tool, AppDynamics, we have a network view visualisation similar to below (ctto)

Will Elastic APM have something similar soon?

Even Pinpoint has something like it:

As I would like to have something similar in Elastic APM, I tried using Graph but couldn't find a way to replicate it. I was basically showing services as nodes, and trying to associate similar transaction/trace names to get count, and attempting to compute for transaction.duration.us metrics for the timings, but my efforts are (shamefully) failing.

Any leads/advice on this?

Thanks!

Hi Ronald,

Thanks for getting in touch, and your topic is indeed in the right place. Service maps is something that we're considering adding to the Kibana APM UI, but there's no committed implementation or timeline details as of yet. We'll take your suggestion as a big +1 for adding such a visualization feature.

May I ask a few questions along the way to help us pinpoint the use-case;

  • Would you consider this view as your main entry point to investigating a possible service performance issue further, instead of the default services / traces lists?
  • Would you consider the view would cover your use-case if you were able to map a single trace? Considering that these views often get very overwhelming quite quickly when simply viewing all services or all traces at the same time.
  • Beyond seeing the relationship between services and basic metrics on each service, what other information do you see would make this view even more useful?

Thanks in advance!

Hi Casper, many thanks for the update. We'd really love to have this services map as it has proven crucial for us to see at a glance where bottlenecks are, allowing us to focus trouble shooting in parallel.

For our own use use, we have the following (corresponds to the order of the bulleted questions above):

  1. The main entry point is still the list of slowest traces (arranged in descending order of duration, configured to show the trend for the past 6 hours with 5 minute aggregates. This list would ideally be split into two parts:
    (a) a list that we explicitly set (our "key" business transactions)
    (b) list of the slowest traces by duration

When a transaction/trace is clicked, it will go to the services view showing the relevant services traversed.

  1. Having a filterable service list would probably be the ideal one where one can select a service and the corresponding traces that flow through it are displayed. An overall global services view is a nice-to-have, but might quickly become a Death Star diagram of hundreds of services and links if there is no way to limit the view via a top traces filter of sorts (by request volume maybe? or by total duration/error-rate/etc?)

  2. A drill down view would be great to show associated metrics like errors or error rates (the coloured circles of AppD makes for a nice reference for an "at-a-glance" view of slow vs fast, good vs error thingy). A quick time period switcher would also be good to have to show how the numbers of the traces are say past one hour vs past 24 hours. Also good to have would be a link to the most recent traces (maybe a link to the Discover page with auto-filtering for the relevant traces). Another possible nice-to-have would be clicking on a service shows statistics for the worst contributors for bad numbers (say, recurring errors or stack info).

One cool feature to have in this regard is a "snapshot" where we can either keep a record of all states/info/statistics for a point-in-view time in another APM index, or if this is troublesome, at least a way to export the info on the current view so that we can be sure to keep a history and not lose it if we apply some ILM/roll up to our APM indices.

Thanks Ronald! Really appreciate the thorough feedback and use-case stories provided here. It will help us greatly in scoping such a feature going forward.

  1. The main entry point is still the list of slowest traces (arranged in descending order of duration, configured to show the trend for the past 6 hours with 5 minute aggregates. This list would ideally be split into two parts:
    (a) a list that we explicitly set (our "key" business transactions)
    (b) list of the slowest traces by duration

The important (key business) transactions is something that we have on our radar, and we know this greatly improves the focus of solving performance problems that are relevant for your business.

Having a filterable service list would probably be the ideal one where one can select a service and the corresponding traces that flow through it are displayed. An overall global services view is a nice-to-have, but might quickly become a Death Star diagram of hundreds of services and links if there is no way to limit the view via a top traces filter of sorts (by request volume maybe? or by total duration/error-rate/etc?)

I agree with the notion that the global services map is more of a great visual feature, than one that remedies that need for quickly troubleshooting performance issues. Filtering might help, but essentially it's more important to see the services map for a given transaction/trace that is important to you. Thanks for confirming that story.

A drill down view would be great to show associated metrics like errors or error rates (the coloured circles of AppD makes for a nice reference for an "at-a-glance" view of slow vs fast, good vs error thingy). A quick time period switcher would also be good to have to show how the numbers of the traces are say past one hour vs past 24 hours. Also good to have would be a link to the most recent traces (maybe a link to the Discover page with auto-filtering for the relevant traces). Another possible nice-to-have would be clicking on a service shows statistics for the worst contributors for bad numbers (say, recurring errors or stack info).

Adding visual cues like highlighting problematic services are important to still dissect the amount of information that even a single trace service map would give you. I like the suggestion of showing top lists for each service that would help quickly find the details that you're looking for.

One cool feature to have in this regard is a "snapshot" where we can either keep a record of all states/info/statistics for a point-in-view time in another APM index, or if this is troublesome, at least a way to export the info on the current view so that we can be sure to keep a history and not lose it if we apply some ILM/roll up to our APM indices.

That's an interesting idea to snapshot the information for later reference or even comparison.

Thanks again for giving us this valuable feedback.

Hi Casper,

I'm glad we could help in any way. We really are committed on our side to maximise the Elastic stack for observability thus we continue to push the boundaries of what we can do and test on our end.

I have to admit I am a little oriented towards an AppD way of doing things, as that has been our methodology in the past. We're are very open to new ways of doing things if you think there are more efficient ways of doing these.

Thanks for accommodating these inputs, much appreciated.

Let's leave this topic open in case others have suggestions or use-case stories to share. I'll make sure to share any updates on our end on the feature.

Thanks again.

Hi there,

I'd like to follow up on my previous post asking about this.

I also have my experience biased by appdynamics, but I think their tooling is awesome.
I agree with Roland in terms of what to expect from the service map functionality as well as average latencies and errors, different colours per average http codes, correlation between transactions and the service maps itself, etc..

I'll be following up this topic as I'm very interested in finding out more about the progress of this features.

Regards,
Carlos M.

Thanks Carlos for adding your thoughts and suggestions.

Very nice thread, I was looking for this and wish Elastic would implement the service map functionality as earliest as this could b game changer in APM monitoring.

This topic was automatically closed 62 days after the last reply. New replies are no longer allowed.