Very high sys cpu usage with HTTP KeepAlive

Hello,

I am using ElasticSearch 1.0.0 with Kibana 3 on Solaris 11.1. My environment is as following.

$ ./elasticsearch -v
Version: 1.0.0, Build: a46900e/2014-02-12T16:18:34Z, JVM: 1.7.0_07

$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)

$ cat /etc/release
Oracle Solaris 11.1 X86
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
Assembled 19 September 2012

I was suffering from very high CPU usage of Java process.
When I connected to ES via Kibana, the CPU usage became 90%-100%. Strangely, user cpu usage was not so high, but sys cpu usage was very high with vmstat command.

I tried to examine system calls by using truss command. I found there are so many ioctl calls while connecting.

$ sudo truss -c -p 18496
syscall seconds calls errors
read .000 13 4
write .000 12
ioctl 1.043 142731
lwp_cond_wait .002 178 153
lwp_cond_signal .000 25
pollsys .000 54
lwp_mutex_timedlock .000 1
pread64 .016 1432
-------- ------ ----
sys totals: 1.062 144446 157
usr time: .387
elapsed: 3.620

$ sudo truss -p 18496
...
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
...

I thought there are something wrong with HTTP KeepAlive. And, I considered to put HTTP Proxy between ES and Kibana(client browser) to disable KeepAlive. I setuped Apache mod_proxy to disable KeepAlive by folowing settings.

ProxyPass /es http://localhost:9200
ProxyPassReverse /es http://localhost:9200
SetEnv proxy-nokeepalive 1

And, I changed Kibana config.js to connect to ES via proxy. After that, the CPU usage dramatically came down. There may be some problem with HTTP KeepAlive at least on Solaris.

Thanks,
K.Cima

I think you do not need a proxy.

Just disable TCP keep alive in HTTP. It is enabled by default.

http.tcp_keep_alive: false

Jörg

On Wed, Feb 19, 2014 at 1:38 PM, K.Cima shima@shakemid.com wrote:

Hello,

I am using Elasticsearch 1.0.0 with Kibana 3 on Solaris 11.1. My
environment
is as following.

$ ./elasticsearch -v
Version: 1.0.0, Build: a46900e/2014-02-12T16:18:34Z, JVM: 1.7.0_07

$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)

$ cat /etc/release
Oracle Solaris 11.1 X86
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights
reserved.
Assembled 19 September 2012

I was suffering from very high CPU usage of Java process.
When I connected to ES via Kibana, the CPU usage became 90%-100%.
Strangely,
user cpu usage was not so high, but sys cpu usage was very high with vmstat
command.

I tried to examine system calls by using truss command. I found there are
so
many ioctl calls while connecting.

$ sudo truss -c -p 18496
syscall seconds calls errors
read .000 13 4
write .000 12
ioctl 1.043 142731
lwp_cond_wait .002 178 153
lwp_cond_signal .000 25
pollsys .000 54
lwp_mutex_timedlock .000 1
pread64 .016 1432
-------- ------ ----
sys totals: 1.062 144446 157
usr time: .387
elapsed: 3.620

$ sudo truss -p 18496
...
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/49: ioctl(105, DP_POLL, 0xF6C33650) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
/48: ioctl(102, DP_POLL, 0xF6C746D0) = 1
/47: ioctl(99, DP_POLL, 0xF6CB5750) = 1
...

I thought there are something wrong with HTTP KeepAlive. And, I considered
to put HTTP Proxy between ES and Kibana(client browser) to disable
KeepAlive. I setuped Apache mod_proxy to disable KeepAlive by folowing
settings.

ProxyPass /es http://localhost:9200
ProxyPassReverse /es http://localhost:9200
SetEnv proxy-nokeepalive 1

And, I changed Kibana config.js to connect to ES via proxy. After that, the
CPU usage dramatically came down. There may be some problem with HTTP
KeepAlive at least on Solaris.

Thanks,
K.Cima

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Very-high-sys-cpu-usage-with-HTTP-KeepAlive-tp4049998.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1392813529284-4049998.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHnjhFMbEsJxsT0HioQ1whvtXx77cfsr4U5Y%2BwzmyybXw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for the information.
I tried both following settings in elasticsearch.yml right away.

http.tcp_keep_alive: false
network.tcp.keep_alive: false

But, unfortunately it seemed to have no effect. The cpu usage became high again.

K.Cima

Just a thought,
What do you have pre-configured in Kibana?
If you have some dashboards which are pulling large amounts of data at
small time intervals, it might have a big effect.

Also, I've found that the webserver you use might have a large effect. I
haven't tested running Kibana as a plugin compared to running on Apache,
but I know when I fire up any elasticsearch related web tool on Apache,
there is a large effect on my systems(interestingly, if Apache is idle and
no one is actively running a web tool there is no significant load).

Tony

On Wednesday, February 19, 2014 6:35:10 AM UTC-8, K.Cima wrote:

Thank you for the information.
I tried both following settings in elasticsearch.yml right away.

http.tcp_keep_alive: false
network.tcp.keep_alive: false

But, unfortunately it seemed to have no effect. The cpu usage became high
again.

K.Cima

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Very-high-sys-cpu-usage-with-HTTP-KeepAlive-tp4049998p4050010.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/197d6a4a-8caf-413e-834c-1fbde4901374%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for your feedback.
I am using Kibana with default config. And, I am testing ES with one index contains about only 1MB of documents.
So, I don't think data size or queries from Kibana cause high cpu load in my case.

K.Cima

To clearly differentiate the Kibana from what it may be monitoring,
Maybe deploy Kibana on a completely separate machine?

That way, anything related to deploying Kibana can be isolated and studied.
Local sensors (eg Marvel) and other processes not strictly part of Kibana
but may be invoked by Kibana remain on the ES nodes.

And, as I suggested, maybe launching on a different webserver engine might
make a diff.

Tony

On Thursday, February 20, 2014 2:29:11 AM UTC-8, K.Cima wrote:

Thank you for your feedback.
I am using Kibana with default config. And, I am testing ES with one index
contains about only 1MB of documents.
So, I don't think data size or queries from Kibana cause high cpu load in
my
case.

K.Cima

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Very-high-sys-cpu-usage-with-HTTP-KeepAlive-tp4049998p4050084.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b630e573-77f3-4a7e-9765-82a2a77bb6d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also, not sure exactly what caused your problem but these may help:

  1. Tells you what threads are doing what in ES:

curl "localhost:9200/_nodes/hot_threads"

  1. Tells you how many http connections have been opened so far:

curl "localhost:9200/_nodes/stats/http?pretty"

If that HTTP total count keeps on increasing very fast, that means HTTP
keep-alive is not working somewhere in the chain between Kibana to ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75a94d2f-c957-4433-9aa5-a78ee227dd4e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for your reply.
I installed Kibana on separated host to eliminate effect of webserver. But, the CPU usage of ES host became high just like before.
The hot thread seems to be http_server_worker. And, total_opened connections did not increase while repeatedly connecting from Kibana. So, I think KeepAlive does work, but there seems to be something wrong with KeepAlive. There may be some problems between HTTP KeepAlive implementation and Solaris /dev/poll.

http://server:9200/_nodes/hot_threads

::: [Tom Corsi][OOF_Kl3qS82C0GuJX-nzdA][server][inet[/192.168.0.3:9300]]

100.7% (503.6ms out of 500ms) cpu usage by thread 'elasticsearch[Tom Corsi][http_server_worker][T#8]{New I/O worker #26}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:220)
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:84)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

100.7% (503.3ms out of 500ms) cpu usage by thread 'elasticsearch[Tom Corsi][http_server_worker][T#3]{New I/O worker #21}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:220)
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:84)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

100.7% (503.3ms out of 500ms) cpu usage by thread 'elasticsearch[Tom Corsi][http_server_worker][T#2]{New I/O worker #20}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:220)
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:84)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)

http://server:9200/_nodes/stats/http?pretty

{
"cluster_name" : "elasticsearch",
"nodes" : {
"cfkkA--DTCa1oOW2LDHomg" : {
"timestamp" : 1392985424001,
"name" : "Volpan",
"transport_address" : "inet[/192.168.0.3:9300]",
"host" : "server",
"ip" : [ "inet[/192.168.0.3:9300]", "NONE" ],
"http" : {
"current_open" : 4,
"total_opened" : 5
}
}
}
}


I found similar issue about Netty. I updated Java version 1.7.0_07 to 1.7.0_51. After that, the heavy CPU load has gone. It seemed to be bug of Java.

Yes, it is a known Java "epoll" bug that is fixed in Java >= 7u11

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHfW4HKLnAFhopX66dcMz%3DbiewbkEPPwF0NrGdSTHoLXg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

All right. Thank you!