Heap / GC Issues


(Ned Campion) #1

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has ~100
indices, w/ doc counts in the 1k - 50k range. There is a low/medium amount
of index load going into the cluster via the bulk api and a large amount of
search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at
the time of the following sample I was testing out running with a 20GB
heap. The process runs well for a while, a couple hours to a day or two
depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is a
graph of the "old" generation growing to nearly the complete heap size and
then staying there for hours. During this time the application is severely
degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every minute
or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried
upgrading to 7u65 with no effect. I just did a heap dump analysis using
jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken
up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour or
two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion nedcampion@gmail.com wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has
~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium
amount of index load going into the cluster via the bulk api and a large
amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though at
the time of the following sample I was testing out running with a 20GB
heap. The process runs well for a while, a couple hours to a day or two
depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is
a graph of the "old" generation growing to nearly the complete heap size
and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried
upgrading to 7u65 with no effect. I just did a heap dump analysis using
jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken
up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour
or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ned Campion) #3

Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily,
updated at a fairly low but constant pace) and just 448k documents between
them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned

On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom markw@campaignmonitor.com
wrote:

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion nedcampion@gmail.com wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has
~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium
amount of index load going into the cluster via the bulk api and a large
amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though
at the time of the following sample I was testing out running with a 20GB
heap. The process runs well for a while, a couple hours to a day or two
depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This is
a graph of the "old" generation growing to nearly the complete heap size
and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've tried
upgrading to 7u65 with no effect. I just did a heap dump analysis using
jmap and Eclipse Memory Analyzer and found that 85% of the heap was taken
up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour
or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 10:55, Ned Campion nedcampion@gmail.com wrote:

Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily,
updated at a fairly low but constant pace) and just 448k documents
between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned

On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom markw@campaignmonitor.com
wrote:

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion nedcampion@gmail.com wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has
~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium
amount of index load going into the cluster via the bulk api and a large
amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though
at the time of the following sample I was testing out running with a 20GB
heap. The process runs well for a while, a couple hours to a day or two
depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This
is a graph of the "old" generation growing to nearly the complete heap size
and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've
tried upgrading to 7u65 with no effect. I just did a heap dump analysis
using jmap and Eclipse Memory Analyzer and found that 85% of the heap was
taken up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every hour
or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ned Campion) #5

5 data nodes as described, and 2 master nodes, 50 clients connected
directly, that's it for docements 448k, hope that's what you mean

Thank you for the help

On Friday, July 18, 2014, Mark Walkom markw@campaignmonitor.com wrote:

How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
<javascript:_e(%7B%7D,'cvml','markw@campaignmonitor.com');>
web: www.campaignmonitor.com

On 19 July 2014 10:55, Ned Campion <nedcampion@gmail.com
<javascript:_e(%7B%7D,'cvml','nedcampion@gmail.com');>> wrote:

Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily,
updated at a fairly low but constant pace) and just 448k documents
between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned

On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <markw@campaignmonitor.com
<javascript:_e(%7B%7D,'cvml','markw@campaignmonitor.com');>> wrote:

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
<javascript:_e(%7B%7D,'cvml','markw@campaignmonitor.com');>
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion <nedcampion@gmail.com
<javascript:_e(%7B%7D,'cvml','nedcampion@gmail.com');>> wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has
~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium
amount of index load going into the cluster via the bulk api and a large
amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap, though
at the time of the following sample I was testing out running with a 20GB
heap. The process runs well for a while, a couple hours to a day or two
depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This
is a graph of the "old" generation growing to nearly the complete heap size
and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've
tried upgrading to 7u65 with no effect. I just did a heap dump analysis
using jmap and Eclipse Memory Analyzer and found that 85% of the heap was
taken up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be
a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every
hour or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

You said you have one index of 16GB but that you have ~100 indexes in
total. How much data across all those indexes, the cluster total in GB/TB.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 11:25, Ned Campion nedcampion@gmail.com wrote:

5 data nodes as described, and 2 master nodes, 50 clients connected
directly, that's it for docements 448k, hope that's what you mean

Thank you for the help

On Friday, July 18, 2014, Mark Walkom markw@campaignmonitor.com wrote:

How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 10:55, Ned Campion nedcampion@gmail.com wrote:

Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily,
updated at a fairly low but constant pace) and just 448k documents
between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned

On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom markw@campaignmonitor.com
wrote:

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion nedcampion@gmail.com wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster has
~100 indices, w/ doc counts in the 1k - 50k range. There is a low/medium
amount of index load going into the cluster via the bulk api and a large
amount of search traffic going in in the 40K queries per second range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap,
though at the time of the following sample I was testing out running with a
20GB heap. The process runs well for a while, a couple hours to a day or
two depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior. This
is a graph of the "old" generation growing to nearly the complete heap size
and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've
tried upgrading to 7u65 with no effect. I just did a heap dump analysis
using jmap and Eclipse Memory Analyzer and found that 85% of the heap was
taken up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may be
a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every
hour or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2KPqft0evkzQOTHT-uD3V2v8fXCX%3DHiNr7o0OtXAURQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ned Campion) #7

Did I, sorry, 16GB across 100 indices

On Friday, July 18, 2014, Mark Walkom markw@campaignmonitor.com wrote:

You said you have one index of 16GB but that you have ~100 indexes in
total. How much data across all those indexes, the cluster total in GB/TB.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
<javascript:_e(%7B%7D,'cvml','markw@campaignmonitor.com');>
web: www.campaignmonitor.com

On 19 July 2014 11:25, Ned Campion <nedcampion@gmail.com
<javascript:_e(%7B%7D,'cvml','nedcampion@gmail.com');>> wrote:

5 data nodes as described, and 2 master nodes, 50 clients connected
directly, that's it for docements 448k, hope that's what you mean

Thank you for the help

On Friday, July 18, 2014, Mark Walkom <markw@campaignmonitor.com
<javascript:_e(%7B%7D,'cvml','markw@campaignmonitor.com');>> wrote:

How much in total, in the entire cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 10:55, Ned Campion nedcampion@gmail.com wrote:

Hey Mark,

The index is currently 16GB with 10 indices (added and removed daily,
updated at a fairly low but constant pace) and just 448k documents
between them (w/ geoshape + terms indexes)

Thanks for any and all help! and further info if desired

Best,
Ned

On Fri, Jul 18, 2014 at 8:36 PM, Mark Walkom <markw@campaignmonitor.com

wrote:

How many indexes and how much data do you have?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 July 2014 01:09, Ned Campion nedcampion@gmail.com wrote:

Hey All,

I've got a cluster with 5 data nodes (2 master nodes). The cluster
has ~100 indices, w/ doc counts in the 1k - 50k range. There is a
low/medium amount of index load going into the cluster via the bulk api and
a large amount of search traffic going in in the 40K queries per second
range.

I'm running these data nodes on ec2 (c3.8xl's) with a 30GB heap,
though at the time of the following sample I was testing out running with a
20GB heap. The process runs well for a while, a couple hours to a day or
two depending on traffic, and then it get's into a bad state where there is
continual doing long gc runs, ie every minute doing a stop the world run
for 30-45sec, and seemingly getting very little out of it (ie starting with
18.8GB heap usage and going to 18.3GB heap usage).

Here the red line is a data node that is exhibiting the behavior.
This is a graph of the "old" generation growing to nearly the complete heap
size and then staying there for hours. During this time the application is
severely degraded.

https://lh4.googleusercontent.com/-JXEVIJVBDDY/U8kyUY7hhyI/AAAAAAAACBo/dceW7JJGKiA/s1600/Screen+Shot+2014-07-18+at+10.37.44+AM.png

Example of one of the gc runs during this time (again they run every
minute or so).

[2014-07-18 00:24:24,735][WARN ][monitor.jvm ] [prod-targeting-es2]
[gc][old][10799][27] duration [41.5s], collections [1]/[42.5s], total
[41.5s]/[2.2m], memory [18.8gb]->[18.3gb]/[19.8gb], all_pools {[young]
[733.2mb]->[249.9mb]/[1.4gb]}{[survivor] [86mb]->[0b]/[191.3mb]}{[old]
[18gb]->[18.1gb]/[18.1gb]}

We are running es 1.2.2 . We had been running Oracle 7u25 and we've
tried upgrading to 7u65 with no effect. I just did a heap dump analysis
using jmap and Eclipse Memory Analyzer and found that 85% of the heap was
taken up with filter cache

https://lh4.googleusercontent.com/-KZ8SJD-o32M/U8kzdtC0KhI/AAAAAAAACBw/TeWTvmOc1rc/s1600/Screen+Shot+2014-07-18+at+1.34.44+AM.png

We are doing a lot of "bool" conditions in our queries, so that may
be a factor in the hefty filter cache.

Any ideas out there? Right now I have to bounce my data nodes every
hour or two to ensure I don't reach this degraded state.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/288da6e7-b85a-4cbf-a83d-d777ee7c9c57%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ajsDAZMbKxR1%2Bck149OjJrgfhwDqsATbT9PFpZCkbHPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGo3k1cfiZkHnM-WGD-6spOGMSigEyhbPiK3u%2BCeoLq7A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YtZxNCsxu_p%3DU4Rbzi54uTrHTtSwOMsjX%2BKQBZ2kBzew%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAMjDTzGzD7yVeefweYwq7YWQsmEvghgmDLDt%3DKq0YCNmppqkQw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/BS4H3qoFm8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2KPqft0evkzQOTHT-uD3V2v8fXCX%3DHiNr7o0OtXAURQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624Z2KPqft0evkzQOTHT-uD3V2v8fXCX%3DHiNr7o0OtXAURQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjDTzEfS0diHZAK5tu7yvbHKf9WWHyHg0A8tN9XWtGY_mzhfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #8

You are on the right track and you found already the answer to your
question, examine your queries. They seem to be cached and eat your heap.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html

Jörg

On Fri, Jul 18, 2014 at 5:09 PM, Ned Campion nedcampion@gmail.com wrote:

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE06%2BdSouhX1-3xnmrenTDJbCzDOqD4Ma_NxCkFh5QQJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #9

Your filter cache is only taking up 3GB of the heap, which fits with the
default limit of 10% of heap space. So the filter cache is not at fault
here.

I would look at the two usual suspects:

  • field data - how much space is this consuming? Try:
    curl 'localhost:9200/_nodes/stats/indices/fielddata?fields=*&pretty'
  • swap - is it completely disabled?

The swap is a common gotcha. If you have any data in swap it will slow GCs
down terribly, and give you the scenario that you are seeing. See the docs
for how to disable swap:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html#setup-configuration-memory

(i prefer disabling swap completely, rather than relying on mlockall or
swappiness)

On 19 July 2014 09:06, joergprante@gmail.com joergprante@gmail.com wrote:

You are on the right track and you found already the answer to your
question, examine your queries. They seem to be cached and eat your heap.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html

Jörg

On Fri, Jul 18, 2014 at 5:09 PM, Ned Campion nedcampion@gmail.com wrote:

We are doing a lot of "bool" conditions in our queries, so that may be a
factor in the hefty filter cache.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE06%2BdSouhX1-3xnmrenTDJbCzDOqD4Ma_NxCkFh5QQJw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE06%2BdSouhX1-3xnmrenTDJbCzDOqD4Ma_NxCkFh5QQJw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRnLmMMVxFppy%3D3BG%3DD1afTfXpeeOni%3DRkJNunKkTfh0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #10