Help me debug CPU use issues


(Aivars Irmejs) #1

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #2

Use Oracle java, not OpenJDK that might help.

Also check what happens with GC, using something like the Marvel or
ElasticHQ plugins will give you insight into that.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 19:48, Aivars Irmejs aivars@gmail.com wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b3Tm%3DhSoFgKp%3DWaCs0%3DMSY0%2B2HSLbVaphe2D%2BVHvgkfw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #3

Changing Java would not be trivial. Knowing that ES gets tested against
OpenJDK during development, could Oracle Java really help, or is this just
a standard answer to all kinds of issues? If there's a big chance that it
will help I will do it. Even better if there are some kind of diagnostics
that could tell me that OpenJDK is at fault.

Thanks,
Aivars

On Wednesday, March 5, 2014 10:59:01 AM UTC+2, Mark Walkom wrote:

Use Oracle java, not OpenJDK that might help.

Also check what happens with GC, using something like the Marvel or
ElasticHQ plugins will give you insight into that.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 March 2014 19:48, Aivars Irmejs <aiv...@gmail.com <javascript:>>wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #4

It's standard/best practise to use Oracle's java for performance and
stability.
There's a number of old list posts around this that would be worth looking
into.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 20:34, Aivars Irmejs aivars@gmail.com wrote:

Changing Java would not be trivial. Knowing that ES gets tested against
OpenJDK during development, could Oracle Java really help, or is this just
a standard answer to all kinds of issues? If there's a big chance that it
will help I will do it. Even better if there are some kind of diagnostics
that could tell me that OpenJDK is at fault.

Thanks,
Aivars

On Wednesday, March 5, 2014 10:59:01 AM UTC+2, Mark Walkom wrote:

Use Oracle java, not OpenJDK that might help.

Also check what happens with GC, using something like the Marvel or
ElasticHQ plugins will give you insight into that.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 19:48, Aivars Irmejs aiv...@gmail.com wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YykoU%3Dwa5y%2B4w%3DytO_gN3N1QTVg0%2BmgOq4x3xvvX_YVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #5

Recently I've seen posts saying that latest version of OpenJDK is fine.
I'd rather find a reliable way to debug the issue than start changing
random things.

Thanks,
Aivars

On Wednesday, March 5, 2014 11:57:30 AM UTC+2, Mark Walkom wrote:

It's standard/best practise to use Oracle's java for performance and
stability.
There's a number of old list posts around this that would be worth looking
into.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 March 2014 20:34, Aivars Irmejs <aiv...@gmail.com <javascript:>>wrote:

Changing Java would not be trivial. Knowing that ES gets tested against
OpenJDK during development, could Oracle Java really help, or is this just
a standard answer to all kinds of issues? If there's a big chance that it
will help I will do it. Even better if there are some kind of diagnostics
that could tell me that OpenJDK is at fault.

Thanks,
Aivars

On Wednesday, March 5, 2014 10:59:01 AM UTC+2, Mark Walkom wrote:

Use Oracle java, not OpenJDK that might help.

Also check what happens with GC, using something like the Marvel or
ElasticHQ plugins will give you insight into that.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 19:48, Aivars Irmejs aiv...@gmail.com wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3%
CPU which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c78b9c70-db54-441d-a408-1c7c6f4b6b65%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #6

Look at the output from the hot threads API to see what is consuming the
CPU. Also, I'd check your garbage collection times (look in the logs and
in the nodes stats output), and make sure that you have zero bytes in swap.

On 5 March 2014 11:20, Aivars Irmejs aivars@gmail.com wrote:

Recently I've seen posts saying that latest version of OpenJDK is fine.
I'd rather find a reliable way to debug the issue than start changing
random things.

Thanks,
Aivars

On Wednesday, March 5, 2014 11:57:30 AM UTC+2, Mark Walkom wrote:

It's standard/best practise to use Oracle's java for performance and
stability.
There's a number of old list posts around this that would be worth
looking into.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 20:34, Aivars Irmejs aiv...@gmail.com wrote:

Changing Java would not be trivial. Knowing that ES gets tested against
OpenJDK during development, could Oracle Java really help, or is this just
a standard answer to all kinds of issues? If there's a big chance that it
will help I will do it. Even better if there are some kind of diagnostics
that could tell me that OpenJDK is at fault.

Thanks,
Aivars

On Wednesday, March 5, 2014 10:59:01 AM UTC+2, Mark Walkom wrote:

Use Oracle java, not OpenJDK that might help.

Also check what happens with GC, using something like the Marvel or
ElasticHQ plugins will give you insight into that.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 March 2014 19:48, Aivars Irmejs aiv...@gmail.com wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3%
CPU which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0566e0b0-16cc-4f16-a297-2e187b0cbdcc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5629c17d-603e-4a3c-981c-4bb6c0f3847e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c78b9c70-db54-441d-a408-1c7c6f4b6b65%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c78b9c70-db54-441d-a408-1c7c6f4b6b65%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSk0PzJYo4JPR5fQUSjkL0%2BtvPAnxk5tz_xcfXRMBbqng%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #7

You say "after some use", can you tell us about the garbage collection?
What heap memory do you use?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHgVU41MYT6mkQdAUke7XimGozoTJshQhZs0KSycmezyA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #8

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk - the
good, fresh state:

The worsened state:

The images indeed show that GC is more active and the fight against heap
memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles in
both cases (in 2nd image it has finished its indexings and waiting for the
next) and how to return to the good, clean state without restarting? Are
there any commands to get rid of any caches etc (which I don't need) that
might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #9

Can you provide node stats, RAM, heap size, document count, index size etc?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:00, Aivars Irmejs aivars@gmail.com wrote:

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk - the
good, fresh state:
http://kaitnieks.com/images/elasticsearch_good.png
The worsened state:
http://kaitnieks.com/images/elasticsearch_less_good.png
The images indeed show that GC is more active and the fight against heap
memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles in
both cases (in 2nd image it has finished its indexings and waiting for the
next) and how to return to the good, clean state without restarting? Are
there any commands to get rid of any caches etc (which I don't need) that
might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624awh_phk3NBm1h%2BUxaS86UQ21zhNTfW_yxYy_%2BqCrNjPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #10

RAM: 1.5 GB
Heap: 700 Mb
Index count: 303
Document count (http://h3.blumentals.net:9200/_count - like so?): 2400432 -
which seems more than expected...

Thanks,
Aivars

On Thursday, March 6, 2014 11:13:39 AM UTC+2, Mark Walkom wrote:

Can you provide node stats, RAM, heap size, document count, index size etc?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 March 2014 20:00, Aivars Irmejs <aiv...@gmail.com <javascript:>>wrote:

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk - the
good, fresh state:
http://kaitnieks.com/images/elasticsearch_good.png
The worsened state:
http://kaitnieks.com/images/elasticsearch_less_good.png
The images indeed show that GC is more active and the fight against heap
memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles in
both cases (in 2nd image it has finished its indexings and waiting for the
next) and how to return to the good, clean state without restarting? Are
there any commands to get rid of any caches etc (which I don't need) that
might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #11

700MB heap is really, really small.
It looks like you're hitting the limits of your node, you either need to
remove some documents or add more nodes.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:27, Aivars Irmejs aivars@gmail.com wrote:

RAM: 1.5 GB
Heap: 700 Mb
Index count: 303
Document count (http://h3.blumentals.net:9200/_count - like so?): 2400432

  • which seems more than expected...

Thanks,
Aivars

On Thursday, March 6, 2014 11:13:39 AM UTC+2, Mark Walkom wrote:

Can you provide node stats, RAM, heap size, document count, index size
etc?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:00, Aivars Irmejs aiv...@gmail.com wrote:

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk - the
good, fresh state:
http://kaitnieks.com/images/elasticsearch_good.png
The worsened state:
http://kaitnieks.com/images/elasticsearch_less_good.png
The images indeed show that GC is more active and the fight against heap
memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles
in both cases (in 2nd image it has finished its indexings and waiting for
the next) and how to return to the good, clean state without restarting?
Are there any commands to get rid of any caches etc (which I don't need)
that might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3%
CPU which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624agTuhrOgnteVQMXqysDE0e8w0NMjR8rUB6sYq01ihzHA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #12

I get it's not several GB like most of you use, but when I run ES and while
it's still fresh it works wonderfully - searches quickly and does not
consume CPU. All I want to do is to find a way to prevent it from going
stale, e.g. turn off something that makes the memory fill up unnecessarily
(for my use case), so that it remains in the fresh state where only small
amount of memory is consumed.

Thanks,
Aivars

On Thursday, March 6, 2014 11:30:50 AM UTC+2, Mark Walkom wrote:

700MB heap is really, really small.
It looks like you're hitting the limits of your node, you either need to
remove some documents or add more nodes.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 March 2014 20:27, Aivars Irmejs <aiv...@gmail.com <javascript:>>wrote:

RAM: 1.5 GB
Heap: 700 Mb
Index count: 303
Document count (http://h3.blumentals.net:9200/_count - like so?):
2400432 - which seems more than expected...

Thanks,
Aivars

On Thursday, March 6, 2014 11:13:39 AM UTC+2, Mark Walkom wrote:

Can you provide node stats, RAM, heap size, document count, index size
etc?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:00, Aivars Irmejs aiv...@gmail.com wrote:

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk - the
good, fresh state:
http://kaitnieks.com/images/elasticsearch_good.png
The worsened state:
http://kaitnieks.com/images/elasticsearch_less_good.png
The images indeed show that GC is more active and the fight against
heap memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles
in both cases (in 2nd image it has finished its indexings and waiting for
the next) and how to return to the good, clean state without restarting?
Are there any commands to get rid of any caches etc (which I don't need)
that might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3%
CPU which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make
it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a65fa610-ecdc-47a5-9fd3-72ffb38318c6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #13

At this stage you're going to be wasting more time than it's worth to save
a few ten's of megabtyes (maybe).

Someone else might have some ideas though.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 21:08, Aivars Irmejs aivars@gmail.com wrote:

I get it's not several GB like most of you use, but when I run ES and
while it's still fresh it works wonderfully - searches quickly and does not
consume CPU. All I want to do is to find a way to prevent it from going
stale, e.g. turn off something that makes the memory fill up unnecessarily
(for my use case), so that it remains in the fresh state where only small
amount of memory is consumed.

Thanks,
Aivars

On Thursday, March 6, 2014 11:30:50 AM UTC+2, Mark Walkom wrote:

700MB heap is really, really small.
It looks like you're hitting the limits of your node, you either need to
remove some documents or add more nodes.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:27, Aivars Irmejs aiv...@gmail.com wrote:

RAM: 1.5 GB
Heap: 700 Mb
Index count: 303
Document count (http://h3.blumentals.net:9200/_count - like so?):
2400432 - which seems more than expected...

Thanks,
Aivars

On Thursday, March 6, 2014 11:13:39 AM UTC+2, Mark Walkom wrote:

Can you provide node stats, RAM, heap size, document count, index size
etc?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 March 2014 20:00, Aivars Irmejs aiv...@gmail.com wrote:

I'll start with the fact that I tried Oracle Java and that didn't help.

I did some graphing and here is some more information via bigdesk -
the good, fresh state:
http://kaitnieks.com/images/elasticsearch_good.png
The worsened state:
http://kaitnieks.com/images/elasticsearch_less_good.png
The images indeed show that GC is more active and the fight against
heap memory limit occurs more often.

The question is - why does it happen, knowing that ElasticSearch idles
in both cases (in 2nd image it has finished its indexings and waiting for
the next) and how to return to the good, clean state without restarting?
Are there any commands to get rid of any caches etc (which I don't need)
that might help?

Thanks,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3%
CPU which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server
has multiple CPUs, but it's a VPS and I don't want to use more resources
than I need. ES works perfectly with 3% CPU use before it starts
misbehaving.

What I want to figure most of all is why that happens and ideally
make it stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/89d01c07-6ccf-430e-9de0-bdbfb9dc6155%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8e742649-16e8-43bc-802d-c180bbfcddbb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a65fa610-ecdc-47a5-9fd3-72ffb38318c6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a65fa610-ecdc-47a5-9fd3-72ffb38318c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bxtbRksNRGF7g5Fattcg87euRc4cZBYpF-Y-KfkMm1ag%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Aivars Irmejs) #14

After some aggressive limiting I think I have managed to get what I wanted.
I think it was one of the caches, probably field cache that was eating up
all the memory and never letting it go.

If someone has a similar problem and similar use case (daily-weekly
reindexing, not that much searches), maybe this can help you:

threadpool.index.size: 4
threadpool.search.size: 10
threadpool.suggest.size: 2
threadpool.get.size: 3
threadpool.bulk.size: 2
threadpool.percolate.size: 2
threadpool.warmer.type: fixed
threadpool.warmer.size: 5
threadpool.refresh.type: fixed
threadpool.refresh.size: 5

indices.fielddata.cache.size: 20%
indices.fielddata.cache.expire: 4m

index.cache.filter.max_size: 15
indices.cache.filter.expire: 4m

index.refresh_interval: 5s
index.translog.flush_threshold_size: 30mb

indices.memory.index_buffer_size: 20%
indices.memory.min_shard_index_buffer_size: 1mb
indices.memory.min_index_buffer_size: 1mb

I suspect that indices.fielddata was it, but I'm throwing in other
settings, too, just in case it's a combination of settings that works.

Regards,
Aivars

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf04730-57ce-447c-bc61-d09ec95005ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Aivars Irmejs) #15

Unfortunately this is still not ideal... :frowning: ElasticSearch is behaving well
for much, much longer now, but eventually CPU use rises and looking at
BigDesk I can see 2 things:

  1. GC is actively clearing Old Gen objects (which I assume causes CPU load)
  2. Heap memory rises up much, much steeper compared to when just launched,
    even is ES is idling (which, I assume causes much more vigorous GC activity)

Can anyone explain me:

  1. Why does heap memory rise at all when ES is idling?
  2. Is there a way to get a good picture of what takes how much of heap
    memory exactly?

Thanks,
Aivars
P.S. Turns out I only have 50,000 documents, the 2 million came from a
plugin that didn't have the courtesy to delete it's thrash during uninstall.

On Wednesday, March 5, 2014 10:48:47 AM UTC+2, Aivars Irmejs wrote:

When I'm starting or restarting ES service, its CPU use hovers at 3% CPU
which is great. After some time and some use the CPU use will climb to
30%-100% and stay about that level even when idle. The first time it
happens flushing helps - after the flush CPU level returns to the nice and
pleasant 3% but eventually even that stops helping and only the service
restart does the trick.

30%-100% CPU use could be tolerated theoretically since the server has
multiple CPUs, but it's a VPS and I don't want to use more resources than I
need. ES works perfectly with 3% CPU use before it starts misbehaving.

What I want to figure most of all is why that happens and ideally make it
stay at the 3% level all the time, but I'm not sure how to go about
debugging it.

Hot threads: https://gist.github.com/kaitnieks/9363338
OpenJDK 1.7.0_51
1 node, 300 indices
multicast disabled
happens with and without network.tcp.blocking, but when
network.tcp.blocking is false, CPU use seems to be a little higher.

Thanks,
Aivars

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52853e2e-09e3-4c7c-b9f8-8122d378e929%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #16