What does "No search context found for id ..." mean?

TimWard · October 10, 2017, 8:29am

From Python driver. If I search for this message I find various detailed discussions, but they all seem to assume I'm starting from knowing a lot more than I do.

So, starting from nothing at all, what does this message mean? And then how do I fix it?

s1monw · October 13, 2017, 10:13am

hey tim,

so when we do a search in ES we do it in 2 roundtrips. To make sure it's consistent we register a search context on every shard during the first roundtrip. The second roundtrip passes the context ID on to make sure we operate on the same point in time snapshot as the first roundtrip. Now if something closes the search context ie. if it times out (5min by default) you will see this message. It's possible for instance in situations like scan/scroll that if the users is not coming back we might clean up the context in that case. Do you use scan/scroll?

TimWard · October 13, 2017, 10:36am

It's a scan, like this

for doc in helpers.scan( es,
        index          = "filebeat-*",
        doc_type       = "etp",
        query          = scanQuery.format( args.since, "classname.keyword" ),
        size           = 5000,
        scroll         = "5m",
        raise_on_error = False,     #   Don't know why we sometimes get ScanError otherwise
        preserve_order = True):

Is the "default five minutes" you mentioned what the "scroll" parameter is about? Is this five minutes for a single iteration of the loop processing a single document, or for a batch of 5000 (I'm guessing here that the "size" parameter divides the results up into batches of 5000 but I don't see why that should be my business), or for the entirety of the loop processing all the query results?

s1monw · October 13, 2017, 1:01pm

the 5m is to process a set of 5k results in your case. That is one single roundtrip to ES until you have to get the next batch. ES is not stream based while the Python API might imply that.

TimWard · October 13, 2017, 2:24pm

Thanks. I think that's now enough information for me to experiment with various timings, batch sizes, error detection and retries. (Which, as ever, will no doubt turn out to be far more work than just getting the original logic right. )

s1monw · October 13, 2017, 2:33pm

Which, as ever, will no doubt turn out to be far more work than just getting the original logic right. )

can you elaborate on this? @honzakral maybe you can chime in here since it's python related?

TimWard · October 13, 2017, 2:48pm

I'm only bemoaning the fact that in any real project most of the code is error handling!

s1monw · October 13, 2017, 2:49pm

I'd agree with you I guess. Yet, I think the stream API is trappy but we might be able to improve things here. So lets see what @honzakral says

honzakral · October 13, 2017, 3:32pm

Thanks for the ping @s1monw

I see this error when the search context has timed out so it's no longer present in the cluster. The helpers.scan function is a generator which hides the underlying multiple trips to elasticsearch. If you are not processing the results fast enough, @TimWard, it can lead to the context being cleaned up in elasticsearch which will make the next request for that scroll_id (which is done internally in the helper) produce this error.

To verify that this is the case you can tun on logging at the beginning of your script to see the individual requests being sent to elasticsearch:

import logging
logging.basicConfig(level=logging.INFO)

You can also try setting the scroll param to a higher value to see if it goes away and then try to pinpoint the proper value for your environment.

s1monw · October 13, 2017, 3:36pm

@honzakral I wonder if we should allow scans to renew their context without consuming that way we can make this API more easy to use. Like your client can go back if you are making progress and tell ES to keep things open? just an idea...

honzakral · October 13, 2017, 3:53pm

hmm, that would definitely be a nice solution but I am a bit worried about the side effects that might not be immediately obvious to users - people that never finish iterating over the generator would then keep a context alive in elasticsearch indefinitely which would definitely be bad.

Ultimately I think the current approach is good in that it promotes the idea that scan/scroll is to be used for quick export of data - not keeping a "cursor" open while you perform expensive operation on every document, taking a long time. If that is the case I feel you should use some form of background processing with a queue and a pool of workers anyway.

To improve the user experience would there be a way to keep a tombstone of a search context to provide the user more accurate info? "Your scroll timed out, try increasing your scroll parameter" would be so much more helpful in this case if the overhead is not too big.

Another option would be the idea of a streaming API to/from elasticsearch where this would be done in the coordinating node (potentially same with bulk), that sounds to me though like more trouble than it's worth...

system · November 10, 2017, 3:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SearchContextMissingException during long scroll/scan operations? Elasticsearch	11	6477	November 4, 2022
Elasticsearch SearchContextMissingException during 'scan & scroll' query with Spring Data Elasticsearch Elasticsearch	2	5946	July 5, 2017
Scansearch failure Elasticsearch	2	672	July 6, 2017
DEBUG level error question Elasticsearch	9	541	July 6, 2017
Scan Search error Elasticsearch	1	504	July 6, 2017

What does "No search context found for id ..." mean?

Related topics