A null pointer exception is possible when performing asynchronous requests

1.6.1 - java
APM Agent language and version:

My app seems to fall over sometimes with "LocalizedMessage":"Request cannot be executed; I/O reactor status: STOPPED"

I've been digging into it over the last day or so. Under load, my app seems to get into a state where for some reason everything gets ugly. I'm not sure if it's dependency hell, or just something generally not handling life. In this case, the NPE comes from HttpAsyncRequestProducerWrapper.

The weird thing is, the stack trace doesn't report version or anything...

"thrown":{
"commonElementCount":0,
"localizedMessage":"Error during GET https://webservice.apm.com.au/api/property/v2/state/QLD/suburb/Mansfield/streetname/Dirkala/streetnumber/52/valuation?streettype=St&postcode=4122",
"message":"Error during GET https://webservice.apm.com.au/api/property/v2/state/QLD/suburb/Mansfield/streetname/Dirkala/streetnumber/52/valuation?streettype=St&postcode=4122",
"name":"com.pds.common.http.PDSHttpClientException",
"cause":{
"commonElementCount":1,
"name":"java.lang.NullPointerException",
"extendedStackTrace":[
{
"class":"co.elastic.apm.agent.httpclient.HttpAsyncRequestProducerWrapper",
"method":"close",
"file":"HttpAsyncRequestProducerWrapper.java",
"line":117,
"exact":false,
"location":"?",
"version":"?"
},
{
"class":"org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl",
"method":"releaseResources",
"file":"DefaultClientExchangeHandlerImpl.java",
"line":84,
"exact":false,
"location":"httpasyncclient-4.1.1.jar",
"version":"4.1.1"
},
{
"class":"org.apache.http.impl.nio.client.AbstractClientExchangeHandler",
"method":"close",
"file":"AbstractClientExchangeHandler.java",
"line":399,
"exact":false,
"location":"httpasyncclient-4.1.1.jar",
"version":"4.1.1"
},
{
"class":"org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl",
"method":"responseCompleted",
"file":"DefaultClientExchangeHandlerImpl.java",
"line":182,
"exact":false,
"location":"httpasyncclient-4.1.1.jar",
"version":"4.1.1"
},
{
"class":"org.apache.http.nio.protocol.HttpAsyncRequestExecutor",
"method":"processResponse",
"file":"HttpAsyncRequestExecutor.java",
"line":432,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.nio.protocol.HttpAsyncRequestExecutor",
"method":"inputReady",
"file":"HttpAsyncRequestExecutor.java",
"line":325,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.DefaultNHttpClientConnection",
"method":"consumeInput",
"file":"DefaultNHttpClientConnection.java",
"line":267,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.client.InternalIODispatch",
"method":"onInputReady",
"file":"InternalIODispatch.java",
"line":81,
"exact":false,
"location":"httpasyncclient-4.1.1.jar",
"version":"4.1.1"
},
{
"class":"org.apache.http.impl.nio.client.InternalIODispatch",
"method":"onInputReady",
"file":"InternalIODispatch.java",
"line":39,
"exact":false,
"location":"httpasyncclient-4.1.1.jar",
"version":"4.1.1"
},
{
"class":"org.apache.http.impl.nio.reactor.AbstractIODispatch",
"method":"inputReady",
"file":"AbstractIODispatch.java",
"line":123,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.reactor.BaseIOReactor",
"method":"readable",
"file":"BaseIOReactor.java",
"line":164,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.reactor.AbstractIOReactor",
"method":"processEvent",
"file":"AbstractIOReactor.java",
"line":339,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.reactor.AbstractIOReactor",
"method":"processEvents",
"file":"AbstractIOReactor.java",
"line":317,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
{
"class":"org.apache.http.impl.nio.reactor.AbstractIOReactor",
"method":"execute",
"file":"AbstractIOReactor.java",
"line":278,
"exact":false,
"location":"httpcore-nio-4.4.4.jar",
"version":"4.4.4"
},
]
},

I had a brief look at the code, and it does look like some null checking wouldn't go astray - it does seem to be mostly under load - a single request won't fail, but if i use a tool to generate some load, then this failure may randomly happen - might suggest that some extra checking is warranted in the wrapper?

Hi again Paul and thanks for reporting! :slight_smile:

A null check will make this specific reporting disappear, but if there is a real concurrency issue, where we just fail to close the wrapped Producer, this may lead to countless indirect implications that will be very difficult to debug, so I would avoid that.

Since you report this happens only in high loads, I suspect it is a concurrency issue. As part of our attempts to optimise performance, we tried to save some volatile reads/writes by maintaining an order of access to fields, however I think in this case the Reactor may switch threads multiple times, and maybe that's the issue.
I will make a quick fix for that and ask you to test without doing a null check, so we can have an indication if that's indeed the case. So it would be kind of a test snapshot that may fail, I hope you are OK with that.

In any case, for the final solution I will add a null check, but not inside the wrapper, rather a null check at the point where we decide to wrap.

I'm definitely happy to test it, it's my top priority at the moment.

In terms of 'high load' it's only about 100 requests in 10 seconds currently before it dies. I'm really hoping we can throw significantly more load than that at it and have it remain stable.

We're not currently tweaking reporting frequency or anything, but happy to look at any settings if you believe they'd help...

Actually, there was another (more probable) race issue. I handled both. Please test this build.

I've had a good play with that this morning, and it looks good!

I sent 30k requests at various speeds, and it held up, and kept logging to APM.

I'm happy to call that fixed, thanks!

Awesome! :tada:
1.7.0 should be released very soon, maybe even today, and would contain both fixes.
Thanks for your input!

1 Like