Failed to execute search phase

We are seeing a lot of warnings in our cluster whenever we use the SCROLL API to search for data.

[2019-02-08T16:03:53,314][DEBUG][o.e.a.s.TransportSearchScrollAction] [node01] [594681] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [node12][10.131.54.126:9300][indices:data/read/search[phase/query/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [594681]
	at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:499) ~[elasticsearch-6.5.4.jar:6.5.4]
	at org.elasticsearch.search.SearchService.runAsync(SearchService.java:353) ~[elasticsearch-6.5.4.jar:6.5.4]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:428) ~[elasticsearch-6.5.4.jar:6.5.4]
	at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:376) ~[elasticsearch-6.5.4.jar:6.5.4]
	at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:373) ~[elasticsearch-6.5.4.jar:6.5.4]

The search is successful and I am able to see the data. But what do these warnings mean?

It is also accompanied by a lot of GC errors too:

[GC (Allocation Failure) 2019-02-08T15:47:48.183+0530: 17465.903: [ParNew
Desired survivor size 34865152 bytes, new threshold 6 (max 6)
- age   1:    4209728 bytes,    4209728 total
- age   2:      11008 bytes,    4220736 total
- age   3:     101144 bytes,    4321880 total
- age   4:       1328 bytes,    4323208 total
- age   5:       1024 bytes,    4324232 total

Hi Nachiket,

When this happens, it can sometimes be the case that the TTL set (see: keeping the scroll context alive) has expired. That would be the first thing to check. When it expires the context is closed and then this is the error you can get when you try to use the same context.

Another thing to check is that you are using the scroll_id returned from each request, not just the first scroll_id that you received.

Yes, I have set the TTL to 10 seconds. Will try with a higher setting. I am doing this using the python client, so don't want to set it to a higher value.

What is the 6 digit ID [594681] given by the logs? Normally scroll ID is quite huge, isn't it?

I kept the scroll context the same and instead of using an array of elasticsearch hosts in the esclient library used the co-coordinating node address.

Now I do not receive the scroll context error, however, the GC errors still exist. What could be the reason?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.