NullPointerException in scroll

Dan_Markhasin · August 29, 2017, 2:55pm

I've noticed this error today in the logs of our 5.4 cluster when running a scroll query - one that ran with no issues before and is running with no issues after this exception, which I was not able to reproduce.

Has anyone seen this before? Any idea what could be causing it?

[2017-08-29T11:17:29,137][DEBUG][o.e.a.s.TransportSearchScrollAction] [data1-iil-003] [569527] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [data1-iil-003][x.x.x.x:9300][indices:data/read/search[phase/query+fetch/scroll]]
Caused by: java.lang.NullPointerException
at org.elasticsearch.search.DefaultSearchContext.isCancelled(DefaultSearchContext.java:810) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:138) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:294) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:391) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:374) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:371) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.0.jar:5.4.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
[2017-08-29T11:17:29,179][WARN ][r.suppressed ] path: /_search/scroll, params: {scroll=300000ms}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.SearchScrollQueryAndFetchAsyncAction.onPhaseFailure(SearchScrollQueryAndFetchAsyncAction.java:159) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchScrollQueryAndFetchAsyncAction.access$300(SearchScrollQueryAndFetchAsyncAction.java:42) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchScrollQueryAndFetchAsyncAction$1.onFailure(SearchScrollQueryAndFetchAsyncAction.java:146) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:51) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1041) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1145) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1123) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$7.onFailure(TransportService.java:638) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:623) [elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) [elasticsearch-5.4.0.jar:5.4.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
Caused by: org.elasticsearch.transport.RemoteTransportException: [data1-iil-003][x.x.x.x:9300][indices:data/read/search[phase/query+fetch/scroll]]
Caused by: java.lang.NullPointerException
at org.elasticsearch.search.DefaultSearchContext.isCancelled(DefaultSearchContext.java:810) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:138) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:294) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:391) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:374) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.search.SearchTransportService$10.messageReceived(SearchTransportService.java:371) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.0.jar:5.4.0]

thiago · September 6, 2017, 12:05am

This is a DEBUG level message and you should not be worried about it.

Actually, enabling DEBUG in production is not recommended and you should only do it when looking for specific issues.

Cheers

Dan_Markhasin · September 6, 2017, 4:43am

We don't have DEBUG logging level enabled, yet this error appeared anyway in the logs. Also I would not be worried about it if it didn't cause client queries to fail with a NPE error, but it does.

thiago · September 6, 2017, 5:02am

Well, the error is a DEBUG level log message. There is no way for it to show up in the logs if DEBUG level is not enabled.

But if you have a client that is getting the error then it explains why it is being logged as DEBUG in the server.

That NPE is being caused by the fact that the reference to SearchTask is null. Maybe @Igor_Motov could shed a light here?

Igor_Motov · September 6, 2017, 4:26pm

I saw something like this recently in https://github.com/elastic/elasticsearch/issues/25820 We are still trying to figure out what's causing it. I suspect that it might have something to do with scroll timeouts, but I am not totally sure about the concrete mechanism.

@Dan_Markhasin do you see any other messages in the log file on other nodes around the same time? Which JVM and on which platform are you using to run elasticsearch?

Dan_Markhasin · September 6, 2017, 5:42pm

We are running on Java 1.8.0.92 on SLES11SP4.

I do not know if this is related to scroll timeouts - from our logs I can see that the query that triggered this condition failed after ~2 seconds.
Also unlike the GitHub issue, there were no other errors in the logs on any of the machines except the one I pasted above - in fact these were the only entries in the ES log on that machine for the entire day when that happened.

I also have not been able to reproduce this - I have seen this happen twice so far, completely at random, and it has not reoccurred since I opened this thread.

Igor_Motov · September 7, 2017, 1:46pm

I am still trying to figure out how this situation can occur, but I have a few more questions about your setup.

Do you know if this scroll query was executed against a singe shard or multiple shards?
Were there any pre-5.4.0 nodes in the cluster when this error occurred? Are you using any plugins in this cluster?
Do you know if it failed on the last page of the results or somewhere in the middle?

I do not know if this is related to scroll timeouts - from our logs I can see that the query that triggered this condition failed after ~2 seconds.

Can you clarify what you meant by this? According to the stack trace, this was a scroll query, which typically consists of multiple requests. Did it take 2 seconds between the first request that initiated the scroll and this last request that failed or it took 2 seconds just for the last request to fail?

Dan_Markhasin · September 7, 2017, 3:15pm

It was executed against a specific shard
No, the cluster is homogenous (5.4.1)
Unfortunately I don't know

When I said it failed after 2 seconds I meant that 2 seconds had passed since the first request that initiated the scroll to the request that failed.

Igor_Motov · September 7, 2017, 3:44pm

Hmm, according to the stacktrace, the node that failed was 5.4.0, not 5.4.1.

Dan_Markhasin · September 7, 2017, 3:57pm

Oh, sorry, you're correct. It is indeed 5.4.0. I mixed up with a different cluster that was 5.4.1 at the time.

system · October 5, 2017, 3:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NullPointerException when scrolling Elasticsearch	1	861	February 4, 2020
NPE on scrolled queries Elasticsearch	9	1480	July 5, 2017
String index out of range: -1 Exception Elasticsearch	5	1092	July 6, 2017
Elasticsearch query not returning _scroll_id for scroll query Elasticsearch	5	3968	August 4, 2017
Scroll: Failed to execute query phase Elasticsearch	1	1367	June 18, 2018

NullPointerException in scroll

Related topics