I have a node backend and I have setup Opentelemetry to export signals to Opentelemetry collector which further exports the signals (Traces, Metrics and Logs) to elastic. During the performance testing, an API was executed around 600 times. Ideally it should have created the same number of traces and those all should be visible on elk. However, only 500 records are visible on elk. I tried changing elk configurations as well which is mentioned in How do I get past the 500 record limitation?, however this setting isn't working for me and I'm still able to see only 500 records/traces instead of all 600. I'm trying to visualize the traces under Services --> Transactions tab !
This trace sample is always limited to 500 even if I execute the API for 600 times or 700 times and its not only limited to Node js Opentelemetry integration, but behaving the same for services built in Python and Java.
Hi Harshul, There is APM datastream must have created. Could you hit count query and verify how many transaction/traces has been recorded ?
Hi @ashishtiwari1993 I'm not sure how to do that. Can you pls guide me how can I run the count query and where specifically ? I can see traces in Discover tab. However, only 500 are visible if I navigate from Services tab, click on the service, navigate to the desired transaction. I'm pretty sure that 600 traces are getting created but elk is somehow putting a limit of 500 traces upon visualizing in this specific tab (Services --> Transactions).
Hi Harshul,
Can you go to dev tools and hit below query -
GET _cat/indices/*apm*?v
GET /_data_stream/*apm*
It will give you list of indexes and data stream created for the APM. Once you got data stream name hit the count query
GET data-stream/_count
The first command
GET _cat/indices/*apm*?v
has provided a list of more than 50 data streams and I have no clue whatsoever on which data stream do I have to run a count upon ! Can you please let me know if there is a configuration setting to view more than 500 traces ?
How about data stream query?
@ashishtiwari1993 Its a long list of indexes and I'm not capable to know the correct data stream for my service as its completely managed by our Devops team. There seems no way to know which data stream to run the count query on ! May I know the reason for looking into data stream and the count ? This can help me understand the approach better.
Anyone having a solution for this ? Awaiting a response for quite a few days now !
Hi @Harshul, Welcome to the community!
I think there is some confusion on how to use this screen
First, you have more than 500 traces. You have a total of 851 traces in the timeframe at the top time picker (which you did not show). That could be 1 minute or 24 hours, etc.
I would ask, what are you actually trying to accomplish
Are you trying to look for transactions with a specific latency range
Maybe you're trying to find this specific trace
The Trace sample is a sampling of 500 across the entire latency response spectrum.
That is limited to 500 as it does not make sense to "Click Through By Hand" more than 500 traces in detail..
Take a look at my case... I have 18K traces it would not make sense to sift through them all by hand .. I would never want to click through 18K by hand and look at the results
But what I might want to do is focus in on some areas ... especially poor-performing
What you can do is do a drag select on the area you want to focus on, then to 500 trace samples will focus where you want
Really focus in
Also keep in mind this is all governed by the Time Picker at the top... And you can always filter for a specific transaction or trace
AND Don't forget to try the latency correlations it can be very helpful..
So on yours to look at the slow responses...
Hope this helps...
Hi @stephenb ,
I'm working with traces in ELK, and I'm facing a challenge with identifying specific failures. With around 3,000 traces, some API calls are failing, and I need to pinpoint the exact reason behind these failures. Currently, it seems I can only find these issues by navigating to the failing traces in the transactions tab. Is there a way to expand the latency graph and directly link these failing traces down in the table for easier identification? Any guidance would be appreciated!
Thanks!
Not sure exactly what you mean ... and exactly what your definition of failed transaction is...
What version are you on?
First Failed Transactions (our definiton of show up as a different color)
2nd at the Top there is a KQL Filter bar you can filter on anything you want (your definition of failed transaation) and then everything on the page will be filtered. This if often overlooked.
You can search / filter against any field in the APM data... I show
Perhaps share your fialed search critieris
You can search whatever is the metatedata