Too_many_scroll_contexts_exception

elasticsearch -{"statusCode":500,"error":"Internal Server Error","message":"[search_phase_execution_exception\n\tRoot causes:\n\t\ttoo_many_scroll_contexts_exception: Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.]: all shards failed"}

I have this issue since two week now. Seems like someone is executing too many scroll. I am trying to find out who and how this is happening.

when it happens kibana does not work. and some of the following curl does not work.

_tasks?actions=indices:data/read/search*&detailed=true&pretty

_tasks?actions=indices:data/read/scroll&detailed=true

gives me no reply for server until I restart some of the data servers. which I know is not right way to fix this issue. any more suggestion on how to find this

1 Like

I am still facing this issue and can’t find out who is creating this. I am just deleting all scroll to just get by. hope someone has some insight in it.

Hey Sachin,

If you have a license, you can enable audit logging to start diagnose where the request is coming from.

I don’t have license.

1 Like

I guess someone deployed some code that retrieves a lot of data using scrolling about two weeks ago. I can’t think of any way to find out the culprit without audit logging, which does require a license, so I would recommend asking around.

1 Like

Add a trial license ?

2 Likes

hmm, I guess I will have to find most all code which using this and debug it.

this is production cluster, do not want to add trial license and then not work after it expires.

The cluster can be reverted to basic (what you have now) license before trial ends. I am not saying it's a complete no-op, in fact I did make an error myself on this which was subject of an early Q by me on this very forum, but you are already better informed than I was so can take better care ... I believe it's manageable and feasible.

You could also try lowering limits for slowlog, and hope to be lucky?

If the HTTP layer security (HTTPS) is not active you can sniff the port 9200.

1 Like

Thank you all for replying and giving some pointer. I have just requested searh query from user community and some of them was not optimized. I provided them search_after and PIT option.

Also started collecting stack monitoring for cluster. Also improvise smaller query by using get or mget API and/or use REST in place of sql.

Still lot to do. but seems like I am on right track. :slight_smile:

2 Likes