Node agent "could not capture stack trace for span"


(Jeff "fejalish" Dawson) #1

Hello,

I have the Node APM agent running in a Koa project which is successfully sending transactions for basic endpoints, but for any endpoint connecting to Redis, Postgres or even a making basic HTTP Request, I get the following debug log:

"could not capture stack trace for span"

with the error details:

"err: 'Could not process callsites'"

This comes right after the various "intercepted call to..." service.

I'm trying to trace back where the issue is coming from but so far with no results.

Here's some additional info about my setup: The app is running inside a Docker container, and is using Bluebird to wrap the Redis calls. And all libraries are listed in the supported version ranges.

Any help or suggestions would be appreciated.

Cheers,
Jeff.


(Thomas Watson) #2

Thanks for reaching out about this issue.

Are the spans that log the mentioned debug info still being sent correctly to the APM Server regardless or are your transactions shown without any spans?


(Jeff "fejalish" Dawson) #3

Hi Thomas,

The endpoints that error do not show up in the APM server.

Here an example debug log for a Redis call:

elastic-apm init span { id: '634772f8-1822-49c9-8e5b-44c8086cf078' } +3ms
elastic-apm intercepted call to RedisClient.prototype.internal_send_command { id: '634772f8-1822-49c9-8e5b-44c8086cf078', command: 'setex' } +326ms
elastic-apm start span { id: '634772f8-1822-49c9-8e5b-44c8086cf078', name: 'SETEX', type: 'cache.redis' } +10ms
elastic-apm could not capture stack trace for span { id: '634772f8-1822-49c9-8e5b-44c8086cf078', name: 'SETEX', type: 'cache.redis', err: 'Could not process callsites' } +3ms
elastic-apm ended span { id: '634772f8-1822-49c9-8e5b-44c8086cf078', name: 'SETEX', type: 'cache.redis', truncated: false } +1ms
elastic-apm setting transaction result { id: '634772f8-1822-49c9-8e5b-44c8086cf078', result: 'HTTP 2xx' } +332ms
elastic-apm adding transaction to queue { id: '634772f8-1822-49c9-8e5b-44c8086cf078' } +30s
elastic-apm setting timer to flush queue: 10445ms +30s
elastic-apm ended transaction { id: '634772f8-1822-49c9-8e5b-44c8086cf078', type: 'request', result: 'HTTP 2xx', name: 'GET /v1/users/me' } +4ms
error encoding transaction 634772f8-1822-49c9-8e5b-44c8086cf078: Could not process callsites
   elastic-apm flushing queue +10s
elastic-apm flushing queue +10s

(Thomas Watson) #4

Thanks for the detailed log.

I'm not sure why the stack trace cannot be collected in your case (there's numerous legitimate reasons why this can be, but I'm not ruling out a bug either). That being said, I can see that we have a bug in case this happens that means that the span is dropped. I've made a PR to fix this issue (PR#321).

I'll make sure to let you know once it has been merged and released.


(Thomas Watson) #5

Which version of Node.js are you running btw?


(Jeff "fejalish" Dawson) #6

Ah, I see. I might patch my local module based on this to see what results are. Cool.


(Jeff "fejalish" Dawson) #7

We're running the node:8.9-alpine Docker image.


(Thomas Watson) #8

Version 1.5.0 of the Node.js agent have now been released with a fix for the dropped spans that you discovered.

Regarding the issue of the missing span stack traces, I'd love if you could check out the new version and see if it's all Redis spans that don't have a stack trace or if it's just a subset.


(Jeff "fejalish" Dawson) #9

Hallo,

I've updated to 1.5.0 and more data is being sent to the APM server, which is good. But I'm also seeing more detailed error handling.

Here are the error messages I'm receiving for all HTTP Requests, Redis and Postgresql calls:

start span { ...
Elastic APM caught unhandled exception: Cannot read property '_agent' of undefined
error while getting error callsites: Could not process callsites
logging error e934acb1-05ee-483d-af13-6418096ede81 with Elastic APM
sending errors payload
ended span { ...

The exception type is listed as TypeError as well.


(Thomas Watson) #10

Oh, that's embarrassing. You discovered another bug in the agent. I've just fixed it and released a patch release, version 1.5.1.

I'm really sorry about this. Could you please try with this version?


(Jeff "fejalish" Dawson) #11

Hey hey, just wanted to say that the new release improved things. Thanks so much for the updates! :smiley:

It looks like I'm not getting stack traces on any of my spans, with the original "Could not process callsites" error message - but I haven't had a chance to look into it further.

Cheers!


(Thomas Watson) #12

So also other spans than just Redis don't have stack traces?


(Jeff "fejalish" Dawson) #13

Yeah, I'm still seeing it for all Redis, Postgres and HTTP requests. Here are the relevant debug logs for the spans from the last time I tried it:

start span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET',
  type: 'cache.redis' }
could not capture stack trace for span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET',
  type: 'cache.redis',
  err: 'Could not process callsites' }
ended span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET',
  type: 'cache.redis',
  truncated: false }
...
start span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'SELECT FROM users',
  type: 'db.postgresql.query' }
could not capture stack trace for span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'SELECT FROM users',
  type: 'db.postgresql.query',
  err: 'Could not process callsites' }
intercepted end of pg.Client.prototype.query { id: 'b05c1059-8300-452d-8749-925167fd7f8e' }
ended span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'SELECT FROM users',
  type: 'db.postgresql.query',
  truncated: false }
...
start span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET www.googleapis.com/...',
  type: 'ext.http.http' }
could not capture stack trace for span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET www.googleapis.com/...',
  type: 'ext.http.http',
  err: 'Could not process callsites' }
intercepted http.ClientRequest response event { id: 'b05c1059-8300-452d-8749-925167fd7f8e' }
intercepted http.IncomingMessage end event { id: 'b05c1059-8300-452d-8749-925167fd7f8e' }
ended span { id: 'b05c1059-8300-452d-8749-925167fd7f8e',
  name: 'GET www.googleapis.com/...',
  type: 'ext.http.http',
  truncated: false }

(Thomas Watson) #14

Thanks for the feedback.

I've created a branch which will output a little more debug information regarding the "could not capture stack trace for span" errors.

Would it be possible for you to install the agent from that branch instead of from npm?

It should be as easy as changing the value of your elastic-apm-node dependency in package.json to github:watson/elastic-apm-node#debug.


(Jeff "fejalish" Dawson) #15

Hey hey, I updated the repo and here are the relevant debug parts from two Redis and Postgres calls.

(I still plan to try running the app outside of Docker to eliminate that as an option - but haven't had a chance yet.)

Cheers.


Redis:

could not capture stack trace for span { id: '1914158b-09cc-4ea5-bc84-5817f5f70415',
  name: 'GET',
  type: 'cache.redis',
  err:
    { Error: Could not process callsites (arr:false, t1:false, t2:false)
    at callsites (/node/node_modules/stackman/index.js:55:18)
    at Object.callsites (/node/node_modules/stackman/index.js:46:44)
    at Span._recordStackTrace (/node/node_modules/elastic-apm-node/lib/instrumentation/span.js:120:12)
    at Span.start (/node/node_modules/elastic-apm-node/lib/instrumentation/span.js:40:10)
    at RedisClient.wrappedInternalSendCommand [as internal_send_command] (/node/node_modules/elastic-apm-node/lib/instrumentation/modules/redis.js:40:14)
    at RedisClient.get (/node/node_modules/redis/lib/commands.js:58:25)
    at /node/node_modules/es6-promisify/dist/promisify.js:75:41
    at new Promise (<anonymous>)
    at Object.get (/node/node_modules/es6-promisify/dist/promisify.js:54:20)
    at module.exports.RedisStore.get (/node/node_modules/koa-redis/index.js:120:32)
    at RedisStore.get.next (<anonymous>)
    at onFulfilled (/node/node_modules/co/index.js:65:19)
    at /node/node_modules/co/index.js:54:5
    at new Promise (<anonymous>)
    at module.exports.co (/node/node_modules/co/index.js:50:10)
    at module.exports.createPromise [as get] (/node/node_modules/co/index.js:30:15)
    at ContextSession.initFromExternal (/node/node_modules/koa-session/lib/context.js:84:35)
    at session (/node/node_modules/koa-session/index.js:39:32)
    at dispatch (/node/node_modules/koa/node_modules/koa-compose/index.js:42:32)
    at next (/node/node_modules/koa/node_modules/koa-compose/index.js:43:18)
    at pino (/node/node_modules/koa-pino-logger/logger.js:13:12)
    at dispatch (/node/node_modules/koa/node_modules/koa-compose/index.js:42:32)
    at /node/node_modules/koa/node_modules/koa-compose/index.js:34:12
    at Application.handleRequest (/node/node_modules/koa/lib/application.js:150:12)
    at Server.handleRequest (/node/node_modules/koa/lib/application.js:132:19)
    at emitTwo (events.js:126:13)
    at Server.emit (events.js:214:7)
    at Server.emit (/node/node_modules/elastic-apm-node/lib/instrumentation/http-shared.js:45:19)
    at parserOnIncoming (_http_server.js:602:12)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:116:23)
    ...,
    [message]: 'Could not process callsites (arr:false, t1:false, t2:false)' } }

Postgres:

could not capture stack trace for span { id: '1914158b-09cc-4ea5-bc84-5817f5f70415',
  name: 'SELECT FROM users',
  type: 'db.postgresql.query',
  err:
   { Error: Could not process callsites (arr:false, t1:false, t2:false)
    at callsites (/node/node_modules/stackman/index.js:55:18)
    at Object.callsites (/node/node_modules/stackman/index.js:46:44)
    at Span._recordStackTrace (/node/node_modules/elastic-apm-node/lib/instrumentation/span.js:120:12)
    at Span.start (/node/node_modules/elastic-apm-node/lib/instrumentation/span.js:40:10)
    at Client.wrappedFunction [as query] (/node/node_modules/elastic-apm-node/lib/instrumentation/modules/pg.js:69:16)
    at connect (/node/node_modules/pg-pool/index.js:256:14)
    at client.connect (/node/node_modules/pg-pool/index.js:227:11)
    at Connection.<anonymous> (/node/node_modules/pg/lib/client.js:150:7)
    at Object.onceWrapper (events.js:315:30)
    at emitOne (events.js:121:20)
    at Connection.emit (events.js:211:7)
    at Socket.<anonymous> (/node/node_modules/pg/lib/connection.js:117:12)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at addChunk (_stream_readable.js:263:12)
    at readableAddChunk (_stream_readable.js:250:11)
    at Socket.Readable.push (_stream_readable.js:208:10)
    at TCP.onread (net.js:594:20)
    ...,
    [message]: 'Could not process callsites (arr:false, t1:false, t2:false)' } }

(Thomas Watson) #16

It looks like you have another module dependency in your app that's overwriting the Error.prepareStackTrace function without calling the original function.

The Elastic APM agent depends on this function to generate the stack trace for the span.

Would it be possible for you to perform a search of all the modules inside your node_modules directory (you can skip the devDependencies if you like) for the string prepareStackTrace and share the result? Alternatively, if you could share the dependencies section from your package.json file.

If we can narrow down which module that's causing this, I might be able to find a workaround.


(Connor Patterson) #17

Hi Thomas,

I'm not sure if the issue I'm experiencing is related. When running our app locally, all spans seem to be captured fine. But in our production environment, it looks like no spans are being captured. In Kibana we see "No spans available for this transaction." and I've queried ElasticSearch directly and didn't see any spans for certain transactions which should have had spans.

Our application uses the Sails framework. We require and start the agent at the beginning our app.js file. We are using agent 1.5.1 and running node 8.11. I searched for Error.prepareStackTrace and see that Sails includes some modules which overwrite this function (although it seems they reassign it back to itself). Is there anything that comes to mind that we might be doing wrong? Or do you have any pointers on using Elastic APM with a Sails.js app?

Thanks!


(Thomas Watson) #18

You're the first to let us know of any issues with Sails, but I don't know how many people have used it together with Elastic APM yet. I'll investigate and let you know what I find.

Regarding Sails in general: The Elastic APM Node.js agent does not instrument the router used by Sails, so we don't know what to name the transactions when ever an incoming HTTP request is detected. We have a guide you can follow for how to properly name your transactions.


(Jeff "fejalish" Dawson) #20

OK - I found it. It was Babel and/or Node rewriting Error.prepareStackTrace because we have source-maps enabled. When I disabled source-maps the stacks show up fine.

Considering that we only use source-maps for local development and we won't run APM in any dev ENV then we should not have any issues in future. And for testing APM locally I can just disable source-maps as a fix for now. :slight_smile:


Here's more info in case you're interested:

In both node_modules/babel-register/node_modules/source-map-support/source-map-support.js and node_modules/source-map-support/source-map-support.js it rewrites to the following:

// This function is part of the V8 stack trace API, for more info see:
// http://code.google.com/p/v8/wiki/JavaScriptStackTraceApi
function prepareStackTrace(error, stack) {
  if (emptyCacheBetweenOperations) {
    fileContentsCache = {};
    sourceMapCache = {};
  }

  return error + stack.map(function(frame) {
    return '\n    at ' + wrapCallSite(frame);
  }).join('');
}

(Thomas Watson) #21

Ah that explains it. Thanks for sharing.

I think they could easily support running together with Elastic APM if they just made sure to call the original prepareStackTrace function from within their own. Until then, there isn't much we can do unfortunately.

I'll see if I can get it working with a patch and then submit a PR to the source-map-support module so that others will not have this issue.