Apm-agent-go v0.5.2 panic with mu.Unlock

I am using apm-agent-go inside my project, currently I got a crash when loading my application. I am not sure if it is the application's issue or apm-agent-go's issue. Any clue?

fatal error: sync: unlock of unlocked mutex

goroutine 77 [running]:
runtime.throw(0xe6b88d, 0x1e)
	/usr/local/go/src/runtime/panic.go:616 +0x81 fp=0xc4295b8e40 sp=0xc4295b8e20 pc=0x42c801
sync.throw(0xe6b88d, 0x1e)
	/usr/local/go/src/runtime/panic.go:605 +0x35 fp=0xc4295b8e60 sp=0xc4295b8e40 pc=0x42c775
	/usr/local/go/src/sync/mutex.go:184 +0xc2 fp=0xc4295b8e88 sp=0xc4295b8e60 pc=0x470ce2
code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go.(*Transaction).StartSpan(0xc424f11600, 0xc421db8e40, 0x14, 0xc420294ac0, 0xe, 0x0, 0xc421db8e40)
	/go/src/code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/span.go:44 +0x165 fp=0xc4295b8ef0 sp=0xc4295b8e88 pc=0x9bad25
code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go.StartSpan(0xf20460, 0xc428c51bf0, 0xc421db8e40, 0x14, 0xc420294ac0, 0xe, 0xd1ec20, 0xc4240406ce, 0x7fb023550db0)
	/go/src/code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/gocontext.go:41 +0xa1 fp=0xc4295b8f50 sp=0xc4295b8ef0 pc=0x9b79f1
code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/module/apmsql.(*conn).startSpan(0xc4237909c0, 0xf20460, 0xc428c51bf0, 0xc421db8e40, 0x14, 0xc420294ac0, 0xe, 0xc4240406c0, 0x26, 0x0, ...)
	/go/src/code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/module/apmsql/conn.go:49 +0x81 fp=0xc4295b9040 sp=0xc4295b8f50 pc=0x9d96e1
code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/module/apmsql.(*conn).startStmtSpan(0xc4237909c0, 0xf20460, 0xc428c51bf0, 0xc4240406c0, 0x26, 0xc420294ac0, 0xe, 0xcf1520, 0x3000000000000000, 0xc4295b9328)
	/go/src/code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/module/apmsql/conn.go:45 +0xa6 fp=0xc4295b90b0 sp=0xc4295b9040 pc=0x9d9616
code.xiaojukeji.com/cloud-disk-server/vendor/github.com/elastic/apm-agent-go/module/apmsql.(*conn).QueryContext(0xc4237909c0, 0xf20460, 0xc428c51bf0, 0xc4240406c0, 0x26, 0x14a6430, 0x0, 0x0, 0x0, 0x0, ...)

Hi Barry,

This could happen if you start a span referring to a transaction that has already been ended, like this:

tx := elasticapm.DefaultTracer.StartTransaction("name", "type")
ctx := elasticapm.ContextWithTransaction(context.Background(), tx)
elasticapm.StartSpan(ctx, "name", "type") // <- invalid after tx.End()

I see that the span is being created by module/apmsql. Is the query being run within a web request handler?

Yes, inside gin.

OK, one possibility is that your request handler is starting another goroutine which runs database queries, and you're passing the request context to that. The database queries are being run after the request completes, which will cause the tracing to attempt to start a span after the transaction has finished.

Does that sound plausible?

It seems that we are using an asynchronous case(work pool), the error show up in work pool thread, it matches what you described. let me dive in. Thx

1 Like

the basic process is like: req -> pool -> response, but the context is timed out while pooling, it seems like the context is timed out, but the transaction is still calling startSpan. There supposed to be design issues, but what's your suggestion for quick solution?

I see, thank you. I have created Investigate safeguards for apm.StartSpan when ctx containing ended transaction/span · Issue #317 · elastic/apm-agent-go · GitHub to track this issue.

As a quick fix, you could create an intermediate span at a point where you know the request hasn't finished/timed out, and add that span to the context, but don't call its End method until you know no more queries will be sent.

So the order of events would be like:

tx := elasticapm.DefaultTracer.StartTransaction("name", "type")
ctx := elasticapm.ContextWithTransaction(context.Background(), tx)
intermediateSpan, ctx2 := elasticapm.StartSpan(ctx, "intermediate", "span")
elasticapm.StartSpan(ctx2, "name", "type").End()

This should work because the transaction is not referenced after the call to StartSpan returns.

Thx Andrew. Similarly, I am using following code at the beginning of WorkPool tasks, and it works as well.:yum:

tx := elasticapm.DefaultTracer.StartTransaction(name, "WorkPool")
defer tx.End()
context := elasticapm.ContextWithTransaction(context.Background(),  tx)

So that those dbr/sql/http instrumenting works as expected. Eventually, those tx and spans showed up at Dashboard as a dedicated 'Type' -> WorkPool in the middle of Request and Errors tabs.

Great! You won't be able to see the SQL spans within the HTTP request transaction, but this seems like a good quick fix. Keep an eye on the issue I linked, I hope to have a solution in the agent in the near future.

@Barry_Zhong FYI I've just merged https://github.com/elastic/apm-agent-go/pull/319 which will go into the next release. This change will eliminate panics in situations like you described by treating call to StartSpan with an ended transaction as a no-op.

@axw Awesome!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.