How to explore transaction history for every request?


(Shinebayar Gansukh) #1

Hi, I'm new to elastic search stack. We're trying to improve performance of our nodejs application with APM.
As you see on the picture we're using graphql. Means every request goes to /graphql path on server.
image
Let's go to transaction details.
This is where i get lost

  1. Is there any history of requests and their details? Like:
    2018-12-19-18-15: my-api.local:80/graphql
    2018-12-19-18-16: my-api.local:80/settings
    2018-12-19-18-17: my-api.local:80/graphql
    2018-12-19-18-18: my-api.local:80/graphql
    etc and If I click on that specific request details it would show what happened there? Is it possible to do like this?
  2. Currently transaction sample page shows some information but it's not useful for us. Example, why is there only 1 request shows? How can we explore every requests and their details?

Example: New relic shows transaction trace details for every request, like when and where that request happened.


(Thomas Watson) #2

Hi Shinebayar

Thanks for trying out Elastic APM and great to see that you got everything up and running :blush:

UI

Before explaining how to see all the data, let me first explain what we're showing in the 2nd screenshot you posted.

The transaction waterfall you see at the bottom of the page is just a sample from the selected "Transactions duration distribution" bucket and in the current time-range. You can see exactly when this sample was recorded from the "Timestamp" field. When you select a different bucket using the "Transactions duration distribution" graph just above the waterfall, a new sample from that bucket will be shown.

It's useful to look at some of the buckets in the right hand side of the graph as those will be the requests that took the longest to serve. Those are usually the ones you're interested in.

Query for all data

However, if you would like to see all transactions for this endpoint, you need to go to the Discover tab in Kibana and perform a custom query. Here's an example query using the new Kibana Query language:

processor.event: "transaction" AND transaction.name: "POST /graphql"

This query will return all transactions in the current time frame with the name POST /graphql. You can customize the query to only return fields you're interested in or filter the results even further.

Special handling of GraphQL requests

As you also mention, I think the root of the problem is that all your transactions are named simply POST /graphql. This isn't really that useful, as it will group all GraphQL queries into one single transaction group. Preferably you'd like distinct groups per query type, so you can explore them individually.

Therefore, the Node.js agent will try to detect if your application is exposing a GraphQL endpoint and name the transactions according to the GraphQL query. E.g. if it sees a query named hello, it will name the transaction hello (/graphql) (we add the HTTP path in there as well just in case you're exposing multiple GraphQl endpoints in the same application).

This however only works if you use either express-graphql or apollo-server-express to respond to GraphQL requests, which is two of the most popular modules for this purpose.

If you use another module, I'd be happy to see if we can add support for it. But in the meantime (or if use a homemade solution), you can manually override the transaction name to a name that makes more sense to you using the apm.setTransactionName() method on the agent.

I hope this answers your questions both about the UI in general and about how to better instrument GraphQL queries - if not let me know. Also, please let me know what, if any, module you're using today to respond to GraphQL requests.

/thomas


(Shinebayar Gansukh) #3

Hello Thomas! Thank you very very much for your reply.
I think our guys are using https://www.npmjs.com/package/graphql-server-express by looking at their package.json file. We will definitely try out other graphql packages if possible.
I've 2 questions tho,

  1. On the 2nd screenshot you said we're seeing only sample data, is this sample data selected randomly from all 836 requests on this bucket? I understood that what you're calling bucket is that blue column from Response time distribution graph (selectable by their response time range) right?
  2. Thanks for query example, it returned nice output.

processor.event: "transaction" AND transaction.name: "POST /graphql"

One thing tho, Can we see that nice looking dashboard like that one in "Transaction sample" for that particular request? If I copy trace.id from Discover tab and paste it on APM dashboard it's showing Transaction sample for that particular trace. So am I doing it correct?

In general can you give us quick recommendation for our workflow? Like how we find buggy or slow queries that are slowing down our website quickly, or what are most important things to do when using APM? etc, we maybe missing key principles of APM server.
I've tried to search on youtube, google. But didn't find much resources about APM things.
Thanks again.


(Thomas Watson) #4

Looking at graphql-server-express is seems that it's an early development edition of apollo-server-express. And the README just says to use apollo-server-express instead as it should have the same API. So hopefully the switch should be easy :slight_smile:

  1. On the 2nd screenshot you said we're seeing only sample data, is this sample data selected randomly from all 836 requests on this bucket? I understood that what you're calling bucket is that blue column from Response time distribution graph (selectable by their response time range) right?

While not directly random, it's correct that we just choose one of the 836 requests in the selected blue bucket. Roughly speaking, we make a query to Elasticsearch to return one transaction that has the given type+name, is within the bounds of the response time dictated by the bucket, and was recorded within the selected time-range.

Can we see that nice looking dashboard like that one in "Transaction sample" for that particular request? If I copy trace.id from Discover tab and paste it on APM dashboard it's showing Transaction sample for that particular trace. So am I doing it correct?

Depends what you mean by "paste it on APM dashboard". But you should be able to do what you're asking about using the following approach:

  1. From Discover take note of the transaction id (and its trace id) for the transaction you want to see the waterfall for
  2. Go to the Transaction detail page that has the same name as the the transaction whos ids you just found
  3. Append the ids to the URL in the following form: &transactionId=<id>&traceId=<id>, e.g. &transactionId=42cdd1306af31b5b&traceId=66e024c806473ed1e5ac3d58546d197d.

This should load up the transaction with the given id and show its details.

In general can you give us quick recommendation for our workflow? Like how we find buggy or slow queries that are slowing down our website quickly, or what are most important things to do when using APM? etc, we maybe missing key principles of APM server.

In general, you shouldn't need to do the things described above.

What is most important for you is to make sure the transactions are named properly. Switching from graphql-server-express to apollo-server-express seems to be the easiest way to do that. If that isn't possible, adding a line of code to your application that manually names the transactions using apm.setTransactionName() is the way to go.

When all the transactions are named correctly, you should be able to use the front page for the service that's showing all the transaction groups. Make sure that you sort by Impact:

By focusing on the transaction groups with the highest impact first, you spend your time fixing the slow responses that most users experience.

After selecting an appropriate group with a high impact, you normally see a response time distribution curve that looks like this:

As you can see from this curve, most requests are served fast as the bars (buckets) to the left in the graph are higher. Then you see a so called "long tail" of lower and lower buckets going out into the right. Those are normally the requests I would focus on as those for some reason took longer than average.

By clicking one of those buckets to the right in the graph, you'll be presented with a span waterfall showing what a typical request in that bucket was doing. And hopefully from investigating this span waterfall you can see why it was slow and fix it.

This is the most normal approach that our users use to hunt down and fix slow requests. And this doesn't require you to dive into all the transactions with a manual query like described previously.

This was a very short explanation. For a longer introduction to this subject, I can recommend to watch some of our APM webinars. This one is our most popular: https://www.elastic.co/webinars/elasticsearch-apm-overview

Cheers,
Thomas


(Thomas Watson) #5

A colleague just told me that you of course can just use the Query Bar in the top to filter the entire page to just show one transaction/trace. That way you don't have to mess around with the URL.

E.g if you want to see the span waterfall for a transaction with the ID 42cdd1306af31b5b, just add the following to the Query Bar:

transaction.id: "42cdd1306af31b5b"

Or if you have the trace id, you could write:

trace.id: "66e024c806473ed1e5ac3d58546d197d"

That will filter the entire view to just that transaction, and by effect therefore auto-select the bucket it is in and show its waterfall:


(Shinebayar Gansukh) #6

Thank you Thomas, we're going to upgrade our package to apollo-server-express this week. Let's see how it goes. Thanks for the video, with this information I think we should be good to up and going.


(system) #7

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.