This query brought down my cluster :{"query": {"function_score": {"query": {"match_all":{}}, "random_score": {"seed": 123456}}}}

Hi,
I like to share my experience and in the same time hope I can get some
tips.

The query was run against an index with about 700 million documents.
Two things happens,

  1. The node run this query crashed. It is the node configured not to
    proccess data.

  2. The data nodes start crazy on GC. eventually old generation gc cannot
    reduce the heep usage and the nodes becomes unresponsive. in some cases.
    OLD generation gc even increased size of the heap:

2014-12-20 07:21:03,370][WARN ][monitor.jvm ] [****]
[gc][young][2796041][224976] duration [1.1s], collections [1]/[1.3s], total
[1.1s]/[3.4h], memory [21.5gb]->[21.2gb]/[29.8gb], all_pools {[young]
[1.4gb]->[3.4mb]/[1.4gb]}{[survivor]
[191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.9gb]->[21gb]/[28.1gb]}

It is a bad query by itself. But I expected ES cluster handles it
gracefully. It does throw this exception:

  • Caused by: org.elasticsearch.common.breaker.CircuitBreakingException:
    [FIELDDATA] Data too large, data for [_uid] would be larger than limit of
    [19206989414/17.8gb]*
    I guess ES stopped at some point because field data exceeds the default
    limit. But it is too late to stop the query that caused heap memory issue.
    I am wondering if there is any obvious wrong with my ES cluster
    configuration.
    I have 5 box eah with 125 ram and 32 cores. I deploy two data nodes on each
    of them the heap fixed at 31G and configuration is favor bulk ingesting. I
    actually saw above 60+K document ingesting through put per second. It was
    working fine until that query comes.

Thanks,

Jack

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae1b7ea6-d801-4d67-b047-69ab54f1f38b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OK, I think I miss read old generation gc issue. the line I quoted shows
the old generation region mem size after a young gc. it should be expected
after a young generation gc because some surviving objects are promoted
from young to survival and from survivor to old.

Jinyuan (Jack) Zhou

On Mon, Dec 22, 2014 at 7:28 PM, Jinyuan Zhou zhou.jinyuan@gmail.com
wrote:

Hi,
I like to share my experience and in the same time hope I can get some
tips.

The query was run against an index with about 700 million documents.
Two things happens,

  1. The node run this query crashed. It is the node configured not to
    proccess data.

  2. The data nodes start crazy on GC. eventually old generation gc cannot
    reduce the heep usage and the nodes becomes unresponsive. in some cases.
    OLD generation gc even increased size of the heap:

2014-12-20 07:21:03,370][WARN ][monitor.jvm ] [****]
[gc][young][2796041][224976] duration [1.1s], collections [1]/[1.3s], total
[1.1s]/[3.4h], memory [21.5gb]->[21.2gb]/[29.8gb], all_pools {[young]
[1.4gb]->[3.4mb]/[1.4gb]}{[survivor]
[191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.9gb]->[21gb]/[28.1gb]}

It is a bad query by itself. But I expected ES cluster handles it
gracefully. It does throw this exception:

  • Caused by: org.elasticsearch.common.breaker.CircuitBreakingException:
    [FIELDDATA] Data too large, data for [_uid] would be larger than limit of
    [19206989414 <%5B19206989414>/17.8gb]*
    I guess ES stopped at some point because field data exceeds the default
    limit. But it is too late to stop the query that caused heap memory issue.
    I am wondering if there is any obvious wrong with my ES cluster
    configuration.
    I have 5 box eah with 125 ram and 32 cores. I deploy two data nodes on
    each of them the heap fixed at 31G and configuration is favor bulk
    ingesting. I actually saw above 60+K document ingesting through put per
    second. It was working fine until that query comes.

Thanks,

Jack

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/k2RkmjuO5OI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae1b7ea6-d801-4d67-b047-69ab54f1f38b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ae1b7ea6-d801-4d67-b047-69ab54f1f38b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANBTPCHAn78f%2BEbZ9R_6sf5jMKYBL%3DDAZeAL2Lg_JZV1L3peWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.