Mess query lags the ElasticSearch Server

Hi, I had the problem, that some bots are executing some mess queries to my ealsticsearch. In my logs, there we many logs in this way:

    at java.lang.Thread.run(Thread.java:745)

[2016-08-11 07:03:54,166][DEBUG][action.search ] [xxxx-master-1] [3152968] Failed to execute fetch phase
RemoteTransportException[[xxxx-master-1][xx.xx.xx.11:9300][indices:data/read/search[phase/fetch/id]]]; nested: SearchContextMissingException[No search context found for id [3152968]];

The load of the server was increasing until 9 and ElasticSearch doesn't respond anymore. After I restart all nodes the load was again 0,3 and everything was fine.

Is there some explanation of this?

Thanks
Nik

Usually this happens when the server is severely loaded and search contexts expire, but another node sends back results for that now-expired search context. The node doesn't know what to do with those results as it cleaned up the context a while ago, so it logs the message and drops the results.

This can happen if one node is GC'ing hard, for example. If a node receives a search request, then hits a 300 second GC, at the end of the GC it'll execute the search and send it back to the coordinating node. But since 300 seconds have passed, the coordinating node has moved on to other things and doesn't know why it's receiving this old response.

So those log messages are basically symptoms of the bots sending many queries against your server.

The answer here is that you'll need to rate-limit those bots, ban them, etc. Something to prevent people from abusing your service :slight_smile:

Or add more hardware to accommodate their behavior, but that doesn't seem like a good idea imo :wink:

Thanks for the explanation. I understand. I have blocked many bots now, but there is still the problem. And when it happens it seems like my Apache is not working properly anymore and is lagging. After I restart Apache, then everyhing runs fine again. How to avoid that apache is lagging in such case? It seems like it is waitig for something and it is lagging the other processes.