[APM-agent-go] Some Error occurred while calling apmsql/conn.go's finishSpan method


(Barry Zhong) #1

I am using the 1.0 release of apm-agent-go on our project at test environments. And there are some errors of same type occur sometime as following:

{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "gocontext.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/github.com/elastic/apm-agent-go/gocontext.go",
  "line": {
    "number": 64
  },
  "module": "code.com/demo/vendor/github.com/elastic/apm-agent-go",
  "function": "CaptureError"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "conn.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/github.com/elastic/apm-agent-go/module/apmsql/conn.go",
  "line": {
    "number": 71
  },
  "module": "code.com/demo/vendor/github.com/elastic/apm-agent-go/module/apmsql",
  "function": "(*conn).finishSpan"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "conn.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/github.com/elastic/apm-agent-go/module/apmsql/conn.go",
  "line": {
    "number": 93
  },
  "module": "code.com/demo/vendor/github.com/elastic/apm-agent-go/module/apmsql",
  "function": "(*conn).QueryContext"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "abs_path": "/usr/local/go/src/database/sql/ctxutil.go",
  "filename": "ctxutil.go",
  "line": {
    "number": 48
  },
  "module": "database/sql",
  "function": "ctxDriverQuery"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "sql.go",
  "abs_path": "/usr/local/go/src/database/sql/sql.go",
  "line": {
    "number": 1320
  },
  "module": "database/sql",
  "function": "(*DB).queryDC.func1"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "sql.go",
  "abs_path": "/usr/local/go/src/database/sql/sql.go",
  "line": {
    "number": 2867
  },
  "function": "withLock",
  "module": "database/sql"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "sql.go",
  "abs_path": "/usr/local/go/src/database/sql/sql.go",
  "line": {
    "number": 1319
  },
  "module": "database/sql",
  "function": "(*DB).queryDC"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "sql.go",
  "abs_path": "/usr/local/go/src/database/sql/sql.go",
  "line": {
    "number": 1304
  },
  "function": "(*DB).query",
  "module": "database/sql"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "abs_path": "/usr/local/go/src/database/sql/sql.go",
  "filename": "sql.go",
  "line": {
    "number": 1281
  },
  "module": "database/sql",
  "function": "(*DB).QueryContext"
},
{
  "exclude_from_grouping": false,
  "library_frame": true,
  "filename": "dbr.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/git.xiaojukeji.com/houxinzhu/dbr/dbr.go",
  "line": {
    "number": 161
  },
  "module": "code.com/demo/vendor/git.xiaojukeji.com/houxinzhu/dbr",
  "function": "query"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "select_builder.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/git.xiaojukeji.com/houxinzhu/dbr/select_builder.go",
  "line": {
    "number": 94
  },
  "module": "code.com/demo/vendor/git.xiaojukeji.com/houxinzhu/dbr",
  "function": "(*SelectBuilder).LoadContext"
},
{
  "exclude_from_grouping": false,
  "filename": "resource.go",
  "abs_path": "/buildpath/src/code.com/demo/server/model/resource.go",
  "line": {
    "number": 423
  },
  "module": "code.com/demo/server/model",
  "function": "(*Resource).GetTrashByDeadline"
},
{
  "exclude_from_grouping": false,
  "filename": "deletetrash.go",
  "abs_path": "/buildpath/src/code.com/demo/server/services/plugins/deletetrash.go",
  "line": {
    "number": 40
  },
  "function": "(*TrashConfig).cronDeleteTrash",
  "module": "code.com/demo/server/services/plugins"
},
{
  "exclude_from_grouping": false,
  "filename": "deletetrash.go",
  "abs_path": "/buildpath/src/code.com/demo/server/services/plugins/deletetrash.go",
  "line": {
    "number": 28
  },
  "module": "code.com/demo/server/services/plugins",
  "function": "(*TrashConfig).CronDeleteTrash"
},
{
  "exclude_from_grouping": false,
  "abs_path": "/buildpath/src/code.com/demo/server/services/plugins/deletetrash.go",
  "filename": "deletetrash.go",
  "line": {
    "number": 81
  },
  "module": "code.com/demo/server/services/plugins",
  "function": "(*TrashConfig).CronDeleteTrash-fm"
},
{
  "exclude_from_grouping": false,
  "filename": "cron.go",
  "abs_path": "/buildpath/src/code.com/demo/server/common/cron/cron.go",
  "line": {
    "number": 33
  },
  "function": "Cron.CronJob.func1",
  "module": "code.com/demo/server/common/cron"
},
{
  "exclude_from_grouping": false,
  "library_frame": true,
  "filename": "cron.go",
  "abs_path": "/buildpath/src/code.com/demo/vendor/github.com/robfig/cron/cron.go",
  "line": {
    "number": 83
  },
  "module": "code.com/demo/vendor/github.com/robfig/cron",
  "function": "FuncJob.Run"
},
{
  "library_frame": true,
  "exclude_from_grouping": false,
  "filename": "asm_amd64.s",
  "abs_path": "/usr/local/go/src/runtime/asm_amd64.s",
  "line": {
    "number": 2337
  },
  "function": "goexit",
  "module": "runtime"
}    

Any clue of what would be the problem? Thanks.


(Andrew Wilkins) #2

module/apmsql reports the errors reutrned by the SQL query. On the errors page there should be an "Exception message" at the top. What does it say?

Perhaps the apmsql code is a bit too eager, and is reporting errors it should not.


(Barry Zhong) #3

Highlighted as "Bad Connection."


(Andrew Wilkins) #4

Thanks, I think I see the issue.

There is a special error (database/sql/driver.ErrBaddConn) that database/sql drivers can return to indicate that a connection is bad, and a new connection should be made. I'll change apmsql to not report these errors. We should also not report database/sql.ErrNoRows; I'll update the code to ignore that while I'm there.


(Andrew Wilkins) #5

@Barry_Zhong I have filed https://github.com/elastic/apm-agent-go/issues/345, which you can subscribe to for updates. Thanks for reporting the issue!


(Barry Zhong) #6

Sometime Highlighted as "context canceled"


(Andrew Wilkins) #7

Thanks. I've opened https://github.com/elastic/apm-agent-go/issues/347 for that.


(Barry Zhong) #8

Hey @axw the "context canceled" one may not be an issue, we are using a wrong context instance, which will be canceled due to http request process complete, inside a forked goroutine. Sorry for that, and the error information is still valuable, since it indicates there is a functional bug. Such as the sql statement running with a canceled context will return unexpected result.


(Andrew Wilkins) #9

For "context canceled", the APM agent can't tell an expected error from an unexpected one. I think expected "context canceled" errors would be the norm, which is why I went ahead and made the change.

In this case would you see an error further up the stack in the business logic?


(Andrew Wilkins) #10

@Barry_Zhong not sure if you saw my reply, just pinging in case you did not.

I'd also restate what I said, in case I was not clear. I think in most cases that cancelling context would be intentional, rather than a coding error. Because of that, I think it makes sense to ignore such errors, which are used for control flow rather than to indicate a problem.

In your business logic, i.e. in your cronDeleteTrash code, I think you probably don't expect context to be cancelled - right? So can you instead report logic errors up to APM in that code?


(Barry Zhong) #11

@axw Andrew, thanks for keeping tight communications with us.

Conclusion: I do not think it is apm-agent-go's issue. It is our developer mis-use a context, which created from another http handler goroutine, which will be canceled very soon.

Indication: it seems that not every golang programmer uses Context in a right manner. I am not even a golang programmer :stuck_out_tongue_winking_eye:

One thing bothered me a lot is, when I want to apply apm-go-agent into our golang application, I had to face those unknown rabbit holes, and people felt nervous to lift apm as a production solution :(.


(Andrew Wilkins) #12

@Barry_Zhong thanks for your reply.

Conclusion: I do not think it is apm-agent-go's issue. It is our developer mis-use a context, which created from another http handler goroutine, which will be canceled very soon.

Indication: it seems that not every golang programmer uses Context in a right manner. I am not even a golang programmer :stuck_out_tongue_winking_eye:

OK. I think what might help is if we publish some more in-depth examples covering a variety of coding patterns, such as goroutine pools like you have in your application. The focus of these examples would be (a) when/how to report transactions, spans, and errors, and (b) how to propagate context.

I have opened https://github.com/elastic/apm-agent-go/issues/385 to track this.

One thing bothered me a lot is, when I want to apply apm-go-agent into our golang application, I had to face those unknown rabbit holes, and people felt nervous to lift apm as a production solution :(.

Understood, and I'm sorry it hasn't gone smoother. I hope the value you're seeing outweighs the difficulties you've encountered so far.

Don't hesitate to reach out again if you run into any other issues. This is how we can make it better! :slight_smile: