Entire cluster easily disrupted with sizeable geospatial query (OOM)

Hi all,

Recently I've discovered that I can knock our entire cluster offline for a
period of time by executing a reasonably sized geo-query. My suspicion is
that this is not isolated to geo-queries, but that this is just an easy way
to reproduce the problem. I've documented a simple, complete repro case in
this gist https://gist.github.com/olimcc/ee70a6970367b241e100.

Here's what I observe:

  • Issue query to service.

  • Every node in the cluster becomes unresponsive over the next minute or
    so. CPU shoots up and maintains at almost full consumption on every
    machine, memory in the JVM is also at capacity. My assumption is that the
    query has been sent to all nodes in parallel and is now consuming their
    resources.

  • This persists for maybe 15-20 mins or more. Many nodes throw OOM.
    Nodes occasionally rejoin a cluster and re-elect masters. (splitbrain is
    quite common).

  • The only way I've been able to completely resolve this has been to
    manually kill all nodes in the cluster and bring them back one by one.

Note that the issue occurs based on the search alone, not results, there is
no data stored in the service. We use a QuadPrefixTree, and my
understanding is that a number of the tree nodes are loaded into memory
before results are retrieved from them, which may be causing this. I've
attempted to estimate the number of nodes that will be loaded and block
queries from my client if the number is too great.. but this seems hacky,
I'd love a proper solution.

I'm primarily interested in preventing this from happening. I would be
really interested to hear about any ways I can do this without increasing
allocated memory. I am not concerned about recovering from split brain at
the moment (I think it's a separate issue than this cause of it).

I'm wondering:

  • Is there any way Elasticsearch itself can stop this event happening?

  • Or do I need to, in my client, inspect every query before I execute
    it, to ensure it's not too large?

  • If I assume a bad query takes a long period of time, can I or ES kill
    the query after a period? Research in the docs/mailing list suggest I can't
    do this.

Happy to provide other information that would be useful here, and work
through any suggestions people have.
Thanks very much,
Oli

Cluster information
Number of nodes: 5
Java: Sun Java HotSpot(TM) 64-Bit Server VM, 1.6.0_37
Heap allocation: 6gb (on a 7.5gb box - this doesn't match the 50% rule
often mentioned, I can change this if it's related)
Shards: 20, approximately ~10gb each. 2 replicas.
ES Version: 0.20.4
Geo PrefixTree in use: QuadPrefixTree

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.