APM Sidekiq Maxium duration for trace?

Kibana version: 7.4

Elasticsearch version: 7.4

APM Server version: 7.4

APM Agent language and version: Latest v3, Ruby

We're using Ruby APM to trace some Sidekiq jobs and the APM agent picks these up nicely. However, we have a large number of jobs that run for over an hour (some even run for mutliple hours - don't ask!) and I'm seeing that those transactions doesn't appear to be captured. Looking at the bucketing of transactions, the longest one I see is 15 minutes. Yet looking in sidekiq there are hundreds of jobs that are running for > 1 hour.

One thing I notice is that for this service I did have the ELASTIC_APM_TRANSACTION_SAMPLE_RATE set to 0.7 so I'm wondering if it just wasn't sampling them. I'd have expected to see at least a couple though.

Is there some maximum limit to a transaction trace in either time or size and can it be tweaked if so?

Cheers

Dave

Hi Dave!

There's no max length on transactions nor spans, but to be honest we haven't tested the agent with such long-running Sidekiq jobs.

Your theory around sampling could be the reason but I'm not sure. I'll investigate a bit and get back to you.

It could be because Kibana finds it to have too low of an impact versus other jobs. Do you have many kinds of jobs?

Try running this in the developer console and see if the total number of transaction groups is bigger than 100:

GET apm-*-transaction*/_search
{
  "size": 0,
  "query": {
    "term": {
      "service.name": "YOUR_SERVICE"
    }
  },
  "aggs": {
    "transaction_groups": {
      "cardinality": {
        "field": "transaction.name"
      }
    }
  }
}

Hi Mikkel,
Ran the query..it's only 9. I did some digging around after seeing this and it was weird. I found that changing the time period in the APM console to "24 hours" showed me the longer traces even though they were within the last 2 hours. When I had the time set to "2 hours", they didn't appear. Can't quite explain that.

I looked at the logs on the agent found I was getting Queue is full (256 items), skipping… so I tweaked the sample rate and added 2 threads the pool. Coming in this morning, I've got a LOT more of these long running jobs now (some ran for 3 hours) so it's good to know there's actually not a limit on the size of a transaction in terms of duration.

Of course, what I've just hit now is the 500 spans limit so I'm going to experiement with tweaking ELASTIC_APM_TRANSACTION_MAX_SPANS to find a compromise between memory and getting full traces. I suspect I won't find one since I know from looking at the spans we do get that there's a LOT of calls these jobs are making (it's a long story but these jobs backup data from e-commerce stores using their API and there are a lot of calls that need to be made depending on the amount of data to be backed up).

Loving how tweakable all the APM parameters are though - really feels I tune this to get a good compromise between "all the data" and "something that's useful"

1 Like

Thanks a bunch for the update, Dave! Let us know if you need any further assistance.