log file format
apm Span stack details
I can't understand the stack information provided by apm.
The screenshot you pasted from Kibana (the 2. one) is a stacktrace for a given span and not for a given error - if there is also an error associated with that span then there should be another stack trace on the error itself - but the 2 stack traces you show can't be the same - the 1. one shows an exception, the 2. doesn't.
Now, having said that - there is still room for improvement on capturing these stack traces. First of all we could do better with async methods - there is an issue for that already in the agent repo.
Also, on the 2. screenshot I see that an HTTP request happens - so that is an automatically captured span for an outgoing HTTP request. Unfortunately the way we capture the HTTP request does not give us a callstack that'd also contain the user code - we only see the stack after the async call. To improve this I opened this issue.
Nevertheless for the exception I'd expect an error to show up and the callstack and that error should be very similar to what you show on your 1. screenshot.
@GregKalapos
Thank you for your reply, sorry I did not find a picture consistent with the apm stack, so I put a similar picture.
Indeed, I need HTTP request callstack.
I want to see the cause of a back-end 500 error, apm does not show me the stack information similar to the log file, but we are used to looking at the stack format in the log.
Finally, do I need to wait for the next version to see the "stack format in the log file"?
Hi @wajika
I think you asked also on the GitHub PR, but let’s just also follow up here.
So, yeah, first of all there were some stack trace related PRs merged, and in the next release you’ll be able to see where in your code the outgoing HTTP request happens.
Now, I’d like to also add some comments to the screenshot you sent:
As it seems the outgoing HTTP request itself returned HTTP200, but your service (the POST IoT/CheckAndGetProductInfo
) returned HTTP 500.
Of course I don’t know the reason for that, but I would like to mention that if you just set the return type to HTTP500 in your service but no exception leaves the pipeline then the agent won’t be able to capture the error. Similarly if you for example catch every single exception in your service and just return HTTP500 then no exception will leave the pipeline and we won’t be able capture any exception either.
In those cases the easiest is to just capture the exception manually when you handle it. Here is some doc on it.
So if you have some global error handling part, that makes sure no exception leaves the pipeline and your service just return HTTP500, you can do something like this:
catch (Exception e) // Some global error handler
{
Agent.Tracer.CurrentTransaction?.CaptureException(e);
// rest of your code
}
That code will add the exception to your transaction and it'll show up on the UI.
Thank you for your reply.
Sorry. I don't understand what you mean
question 1
As it seems the outgoing HTTP request itself returned HTTP200, but your service (the POST IoT/CheckAndGetProductInfo) returned HTTP 500.
In this case, does APM consider it a success?
question 2
no exception leaves the pipeline then the agent won’t be able to capture the error
AND
Similarly if you for example catch every single exception in your service and just return HTTP500 then no exception will leave the pipeline and we won’t be able capture any exception either.
I didn't understand
I still have one thing I do n’t understand, the service generated an http500 error, why is it showing the error code of apm agent?
Hi @wajika
Question1:
What will happen is that the StatusCode
will be set to HTTP500, which the UI will show with a red background, but no error will be captured.
Question2:
Let me illustrate this with some code. Let’s say you have something like this:
app.Run(async context =>
{
context.Response.StatusCode = (int)HttpStatusCode.InternalServerError;
await context.Response.WriteAsync("Hello, World!");
});
Now the context.Response.StatusCode = (int)HttpStatusCode.InternalServerError;
could be anywhere… if you have let’s say ASP.NET Core MVC and in a controller method you do something like this, then it’s the same:
public IActionResult Index()
{
try
{
//Do some work
}
catch (Exception e)
{
return StatusCode(500);
}
return View();
}
No exception leaves the pipeline in those cases, so the agent has no chance to capture it for you. It’ll capture the HTTP500 which is the response code of the request, but won’t capture an error, since there was no error in the pipeline - in the 1. snippet there is no exception et al, in the 2. one you handled it. That’s why I suggested capturing the error manually in my previous comment.
On the other hand, if you do this:
app.Run(async context =>
{
throw new Exception();
});
or this:
public IActionResult Index()
{
try
{
//Do some work
}
catch (Exception e)
{
throw;
}
return View();
}
Then there is an exception leaving the pipeline so the agent will observe that and show an error on the APM UI.
I still have one thing I do n’t understand, the service generated an http500 error, why is it showing the error code of apm agent?
Sorry, I don’t fully understand. If your question is why you see APM code on the callstack then it’s because the agent subscribes to some internal events therefore at the point of the stack trace capturing the agent is already on the callstack. We don’t trim those from the callstack - we show you the real callstack and since there are agent frames on the callstack those just show up.
If your question is different feel free to elaborate.
@GregKalapos
Let me talk about my own thoughts.
For example, if a http500 error occurs, it belongs to the backend service error, then I need to find out which line of code of the backend service is causing the problem from the apm stack information. (But I didn't find where the stack of the service is displayed)
Similar to the picture below.
But after using apm, I found that the information generated by elastic apm is different from my needs, so I don’t know if elastic apm does not have this function or my usage is wrong.
I am not a developer. I communicated with the developers. Our project uses a unified interceptor to intercept errors.
The code looks like this:
https://paste.ubuntu.com/p/ZxCHD8KjpJ/
@GregKalapos
I think error tracking and apm should be closely related. Has the elastic team considered this when designing the product?
Thanks for the code snippet @wajika, that makes the situation clear.
So you have exactly the situation I described earlier - you have code in the application which handles the exception, therefore the agent has no way to detect it. In this case you have an ASP.NET Core filter which handles the exception and sets the status code manually.
This happens in line 78 in the code snippet:
context.Exception = null; //Handled!
With this, no exception is leaves the pipeline - there is no error to catch. Also, in the GetStatusCode
you set the status code manually. Same as my example above. So please keep in mind that returning HTTP5xx does not mean there is an error to capture.
Now, to the solution: like I said before, you can still manually capture these exceptions before you handle them, you can do something like this:
// This here is a dummy simple exception filter to show how to capture the exception with the Elastic .NET Agent API
public class SampleExceptionFilter: IExceptionFilter
{
public void OnException(ExceptionContext context)
{
Elastic.Apm.Agent.Tracer?.CurrentTransaction.CaptureException(context.Exception);
context.ExceptionHandled = true;
}
}
If you rely this to .NET developers, I think they will be able to understand and take action on this.
To you question:
I think error tracking and apm should be closely related. Has the elastic team considered this when designing the product?
Those are closely related - we capture errors on spans and transactions and you can jump from the errors to transactions and vice versa.
In addition, I would like to make another suggestion. Can the Transaction column support time ordering?
We feel inconvenient when looking for the latest record.
This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.