Node.JS WS and APM Server

So, it's entirely possible that my ws version is 'too new' (as the docs currently say up to 8.0.0) but I'm not seeing any ws stuff instrumented. Express seems to be working though.

Also, what would it take to add functionality for say outgoing fetch/axios/etc requests? rethindkb? or other stuff? can we make plugins or something..? I know we can just push spans/transactions, but just wondering if there's a better way to do it, for example, for 3rd party libraries as mentioned.

Also, if I go APM->Settings->Schema, it "says" that I'm running the old binary, even though elasticAPM integration was installed/added w/fleet.

Kibana version: 8.2.2

Elasticsearch version: 8.2.2

APM Server version: Elastic APM 8.2.0

APM Agent language and version: NodeJS "elastic-apm-node": "^3.34.0",

Original install method (e.g. download page, yum, deb, from source, etc.) and version: elastic/kibana on docker

Fresh install or upgraded from other version?: fresh

Is there anything special in your setup?: no

I actually just noticed on my console

 {"log.level":"debug","@timestamp":"2022-05-27T15:55:20.096Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"intercepted call to ws.prototype.send"}
{"log.level":"debug","@timestamp":"2022-05-27T15:55:20.096Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"no active transaction found - cannot build new span"}

So, it seems like it's at least trying to hook ws.. not sure why it's having an error..?

Hello @forbiddenera -- thanks for giving the Node.js Agent a try.

I can't speak to the

Also, if I go APM->Settings->Schema, it "says" that I'm running the old binary, even though elasticAPM integration was installed/added w/fleet.

issue (I'll ask around), but regarding the Node.js Agent

I'm not seeing any ws stuff instrumented.

The logs you posted are helpful -- thank you. It sounds like you're trying to send a request with ws, but that there's no active transaction to associate the spans with. Transactions are usually started when a Node.js process receives a request in an instrumented framework, or by explicitly starting/stopping a transaction via the API.

If you had a code sample or reproducible test case that would go a long way towards helping us diagnose your issue. If not (understandable) can you talk a little bit about where you're making these ws.send calls? (in a CLI program, etc.)

Regarding

Also, what would it take to add functionality for say outgoing fetch/axios/etc requests? rethindkb? or other stuff? can we make plugins or something..?

While the agent is open source we don't have a formal plugin API. We do have an internal system that looks a lot like a plugin API though. You can see instrumentation for modules here: https://github.com/elastic/apm-agent-nodejs/tree/main/lib/instrumentation/modules -- the TL;DR; is it intercepts module loading and then wraps methods, and those wrapped methods create spans. Because it's an internal/private API there's no easy "getting started" guide -- but if you're the curious self-starter type we're happy to answer any questions you might have if you wanted to enhance/hack-on your local Node.js agent or make a PR into the agent proper.

Let us know on that code sample and we'll do our best to get you those ws spans and transactions.

Hey,

It's a bit difficult to give code as it's part of a relatively complex app, but I can break down what's happening and hopefully that helps, as well as trying to include the applicable snippets..

Upon starting the app, the service management code forks off a new instance, the 'clusterMaster' and detaches the console. Using the cluster module, clusterMaster launches web worker threads. These threads create the https server and listen for requests (cluster module handling distribution/load balancing of requests to each worker thread, which includes websocket connections of course as they start as https). Regular requests get routed through express, which is showing up fine. WebSocket requests connect to the same https server and are authenticated on upgrade and then the ws connection stays open for the entire remainder of the client's (browser) session. Messages are passed back and forth, mostly using a subscription model but ultimately sent simply with ws.send() of course.

Not using any other common frameworks or anything; node, express, ws. Nothing like react/angular or transpilation or webpack (:smiley:) crap. Not using any other libs listed anywhere on the compatibility page.

My initial instinct was maybe my usage of the cluster module is perhaps interfering.

Some applicable snippets:

clusterMaster.js

launchWorkers() {
	return new Promise((resolve,reject) => {
		for (let i=0;i<$configuration.service.threads;i++) {
			let worker = cluster.fork({ 'SERVICE_INSTANCE_UUID':this.instanceId });
			// ... //
		}
		// ... //
	});
}

clusterWorker.js

async start() {
	// ... //
	this.webHandler.startListening();
}

webHandler.js:

webHandler.startListening() {
	return new Promise(async(resolve,reject) => {
		// ... //
		this.wsServer = new ws.Server({
			noServer:true, // we don't pass the http server here to allow async handleUpgrade!
			clientTracking: true,
			maxPayload: $configuration.service.wsMaxPayload
		});
		// ... //
		this.server = this.httpServer.createServer({ key, cert, ca });

		this.wsHandler = new wsHandler(this.wsServer);
		this.server.on('upgrade', this.handleUpgrade.bind(this));
		// ... //

		this.wsServer.on('connection', this.wsHandler.handleConnection.bind(this.wsHandler));
		this.server.on('request', routeHandler.app); // routeHandler.app is an express 'app'

		this.server.listen({
			host: $configuration.service.host,
			port: $configuration.service.port,
			signal: this.controller.signal
		}, () => {
			// ... //
			resolve();
		});
	});
}
async handleUpgrade(req, socket, head) {
	try {
		// websocket Authentication goes here; throw if failed or do below if success:
		this.wsServer.handleUpgrade(req, socket, head, (ws) => {
			this.wsServer.emit('connection', ws, req);
		});
	} catch(e) {
		socket.write('HTTP/1.1 401 WebSocket Protocol Handshake Failure\r\n\r\n');
		socket.destroy();
	}
}

webSocketHandler.js

handleConnection(ws, req) {
	// ... //
	ws.on('message', this.handleMessage.bind(this, ws));
	// ... //
}

sendMessage(message, ws) {
	// ... //
	ws.send(JSON.stringify(message));
	// ... //
}

Hopefully this helps clear it up.

As for instrumenting other libs/etc, looking at the API doc, it looks like apm.addPatch etc could be used to sort of make modules for other libs..? obviously this would be more at the application level rather than included as 'plugins' at the library level; I suppose either direction could be useful depending on the need, but I'll have to dig deeper I guess..? Perhaps using a Proxy on some classes with code to create the transpans would do, at least for some things especially, it sounds not much different from what you explained, curious if you already use Proxy or something else..

apm.registerMetric looks potentially useful, although I wish there was a bit more information about it, as in how it gets logged/saves data compared to transpans..

Also; I noticed that the APM Server config seemed to be forcing the default config to be http? caught me off guard for a second during setup that it was still set to http even after stuffing in a cert.. (Edit APM Integration->General->Server Configuration->URL)

Re: the other issue I mentioned:


Hey,

about the apm-server:

I see that the integration has been installed. How is the apm-server running? Is it running independently in its own docker container? Or, did you add the apm integration to an agent policy, and the elastic-agent assigned to that policy started and is now managing the apm-server?

The latter; it's installed as an integration on elastic agents.

Can confirm this behavior on both 8.2.x and 8.3.0. Is the actual behavior deviating any way from what you would expect, or preventing you from managing APM Server via the agent policy? I'm hoping this is just a visual bug.

EDIT: After some light investigation, it appears to be a visual bug, and should not affect the actual apm integration. If you find that something is in fact wrong with the integration, let us know. Thanks for reporting this!

Re: your last reply, if you are specifically referring to the issue where it says/shows I am running the old bin version, then - yes, I believe it probably is just a visual bug and - no, other than the seemingly incorrect display in that particular view, that issue seems to be isolated to the view and not affecting management or usage of the APM server. If you meant something else than please let me know

I didn't know 8.3 was out so haven't tried it with anything yet, but can check that out soon.

It's a bit difficult to give code as it's part of a relatively complex app, but I can break down what's happening and hopefully that helps, as well as trying to include the applicable snippets..

Thanks @forbiddenera -- that sounds like a pretty neat architecture. Here's some guiding principles and advice that should help you along the way.

Re: cluster and worker threads. Each forked cluster process will end up with its own agent instance. This will generally work OK but transactions or spans won't (at least automatically) cross process boundaries. That is -- if you start a transaction in one process and fork to another process to do some work, that second process won't know about the first transaction. You'll need to start a new transaction in the forked process.

Worker threads should work the same, although in all honesty we don't have a lot of field data from folks using worker threads. If you hit any weirdness we would love to hear about it.

All that said I don't think the problem here is (at least directly), all your concurrency. Our docs could be a bit clearer about this but when you're working with web sockets we instrument the outgoing sends automatically, but we don't have any automatic transaction handling on web socket servers. You'll need to start/end a custom transaction manually. You can read a bit about custom transactions here,

If for instance your application is a background job processing worker or is only accepting WebSockets, you’ll need to manually start and end transactions.

as well as in the docs for the startTransaction method.

That above should give you the tools you need to instrument your web sockets requests. If you have specific questions let us know here or in the Node Agent repo. Thanks again for giving the agent and Elastic APM a spin.

Re: cluster and worker threads. Each forked cluster process will end up with its own agent instance. This will generally work OK but transactions or spans won't (at least automatically) cross process boundaries. That is -- if you start a transaction in one process and fork to another process to do some work, that second process won't know about the first transaction. You'll need to start a new transaction in the forked process.

Nothing like this happens; the service manager forks off to a daemon and that daemon forks out workers with cluster. Not using any actual 'worker_thread's; I know the terminology is confusing when there's forking, clustering, workers and worker threads!

All that said I don't think the problem here is (at least directly), all your concurrency. Our docs could be a bit clearer about this but when you're working with web sockets we instrument the outgoing sends automatically, but we don't have any automatic transaction handling on web socket servers. You'll need to start/end a custom transaction manually. You can read a bit about custom transactions here,

So, if I understand correctly, while outgoing sends are instrumented, they wouldn't be logged unless a transaction has been started?

Will be working on a v2 frontend soon and will be playing with this more with it on both sides.

Back in my day, words used to mean things :wink:

So, if I understand correctly, while outgoing sends are instrumented, they wouldn't be logged unless a transaction has been started?

That's correct. For the common case with those web socket requests you'll need to manually start/end a transaction.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.