Thanks Ronald! Really appreciate the thorough feedback and use-case stories provided here. It will help us greatly in scoping such a feature going forward.
- The main entry point is still the list of slowest traces (arranged in descending order of duration, configured to show the trend for the past 6 hours with 5 minute aggregates. This list would ideally be split into two parts:
(a) a list that we explicitly set (our "key" business transactions)
(b) list of the slowest traces by duration
The important (key business) transactions is something that we have on our radar, and we know this greatly improves the focus of solving performance problems that are relevant for your business.
Having a filterable service list would probably be the ideal one where one can select a service and the corresponding traces that flow through it are displayed. An overall global services view is a nice-to-have, but might quickly become a Death Star diagram of hundreds of services and links if there is no way to limit the view via a top traces filter of sorts (by request volume maybe? or by total duration/error-rate/etc?)
I agree with the notion that the global services map is more of a great visual feature, than one that remedies that need for quickly troubleshooting performance issues. Filtering might help, but essentially it's more important to see the services map for a given transaction/trace that is important to you. Thanks for confirming that story.
A drill down view would be great to show associated metrics like errors or error rates (the coloured circles of AppD makes for a nice reference for an "at-a-glance" view of slow vs fast, good vs error thingy). A quick time period switcher would also be good to have to show how the numbers of the traces are say past one hour vs past 24 hours. Also good to have would be a link to the most recent traces (maybe a link to the Discover page with auto-filtering for the relevant traces). Another possible nice-to-have would be clicking on a service shows statistics for the worst contributors for bad numbers (say, recurring errors or stack info).
Adding visual cues like highlighting problematic services are important to still dissect the amount of information that even a single trace service map would give you. I like the suggestion of showing top lists for each service that would help quickly find the details that you're looking for.
One cool feature to have in this regard is a "snapshot" where we can either keep a record of all states/info/statistics for a point-in-view time in another APM index, or if this is troublesome, at least a way to export the info on the current view so that we can be sure to keep a history and not lose it if we apply some ILM/roll up to our APM indices.
That's an interesting idea to snapshot the information for later reference or even comparison.
Thanks again for giving us this valuable feedback.