Context switches and high load

Hi,

I've noticed that my servers - both physical machines and Xen-based VMs, all running a pair of ES nodes (0.19.8) and Ubuntu 12.04 LTS server, in 64-bit and 32-bit flavours - all seem to cause extremely high numbers of context switches and interrupts as per vmstat output. ES performance, though, is fine.

Here's a sample from the Xen VM:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 5912 187796 40628 312880 0 0 0 0 1948 3261 0 0 100 0
0 0 5912 187920 40628 312884 0 0 0 0 1680 3000 0 0 100 0
0 0 5912 187620 40656 312856 0 0 28 84 2224 2936 7 1 92 0
0 0 5912 187612 40656 312888 0 0 0 64 1839 3043 4 1 95 0
0 0 5912 252364 40656 312888 0 0 0 0 1921 3339 1 1 98 0

(Stopped ES here)

0 0 5912 252364 40656 312888 0 0 0 0 140 152 0 0 100 0
0 0 5912 252636 40656 312856 0 0 0 0 101 148 0 0 100 0
0 0 5912 252636 40656 312856 0 0 0 0 141 181 0 0 100 0
0 0 5912 252636 40656 312856 0 0 0 84 102 154 0 0 100 0
0 0 5912 252636 40656 312856 0 0 0 0 98 155 0 0 100 0
0 0 5912 252760 40656 312856 0 0 0 0 109 159 0 0 100 0
0 0 5912 252760 40656 312856 0 0 0 0 127 181 0 0 100 0
0 0 5912 252760 40656 312856 0 0 0 0 154 197 0 0 100 0

(Restarted it here)

3 0 5912 271392 40824 312996 0 0 320 24 1831 662 15 5 79 0
2 0 5912 222604 40824 313020 32 0 32 212 3968 1226 45 10 45 0
2 0 5912 243700 40824 313052 0 0 0 0 2571 1252 23 5 71 0

This is on a box which is basically idle - no indexing or searching activity. CPU usage is very low, about 1%. System load is high, around 1.1 (on a 4-core VM). The VM above is a small test system, only about 50 docs in ES in one index.

On the VM, ES has a tiny 128MB heap space allocated - on the physical machine, each ES has 6GB (16GB RAM in the machine). Other than the heap space and http/transport ports, all other settings are default.

The physical box exhibits similar characteristics, though, with 31m docs taking about 75GB across 16 indices.

In each case, I'm running 2 nodes on one machine (just a test setup really, to check clustering - prod will be multiple machines) - although knocking out a node only halves the context switches, it doesn't reduce it to 'idle' levels of a couple of hundred.

Performance seems fine, happily. But does this ring any alarm bells for people? Anything I should be worried about? Do other people see this? Or should I just not care?

Here's some system info from the VM:

root@linode-3:/var/log/supervisor# java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.3) (6b24-1.11.3-1ubuntu0.12.04.1)
OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)
root@linode-3:/var/log/supervisor# uname -a
Linux linode-3.secondsync.com 3.4.2-linode44 #1 SMP Tue Jun 12 15:04:46 EDT 2012 i686 i686 i386 GNU/Linux

... and from the physical machine:

root@server-2:~# java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.3) (6b24-1.11.3-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
root@server-2:~# uname -a
Linux server-2.secondsync.com 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com

--

Hi,

I just had very similar problem on newly installed ES (latest version
0.19.9) - was burning all the CPU, and strace was showing a lot of
futex() calls. Turned out to be a leap second insertion between June
and July (server was running since, but had no JVM till today).

As someone suggested here few posts ago:

/etc/init.d/ntp stop
date -s "date"
/etc/init.d/ntp start

Cheers,
Maciej

On Thu, Aug 23, 2012 at 7:14 PM, Dan Fairs dan.fairs@gmail.com wrote:

Hi,

I've noticed that my servers - both physical machines and Xen-based VMs, all
running a pair of ES nodes (0.19.8) and Ubuntu 12.04 LTS server, in 64-bit
and 32-bit flavours - all seem to cause extremely high numbers of context
switches and interrupts as per vmstat output. ES performance, though, is
fine.

Here's a sample from the Xen VM:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 0 5912 187796 40628 312880 0 0 0 0 1948 3261 0 0 100
0
0 0 5912 187920 40628 312884 0 0 0 0 1680 3000 0 0 100
0
0 0 5912 187620 40656 312856 0 0 28 84 2224 2936 7 1 92
0
0 0 5912 187612 40656 312888 0 0 0 64 1839 3043 4 1 95
0
0 0 5912 252364 40656 312888 0 0 0 0 1921 3339 1 1 98
0

(Stopped ES here)

0 0 5912 252364 40656 312888 0 0 0 0 140 152 0 0 100
0
0 0 5912 252636 40656 312856 0 0 0 0 101 148 0 0 100
0
0 0 5912 252636 40656 312856 0 0 0 0 141 181 0 0 100
0
0 0 5912 252636 40656 312856 0 0 0 84 102 154 0 0 100
0
0 0 5912 252636 40656 312856 0 0 0 0 98 155 0 0 100
0
0 0 5912 252760 40656 312856 0 0 0 0 109 159 0 0 100
0
0 0 5912 252760 40656 312856 0 0 0 0 127 181 0 0 100
0
0 0 5912 252760 40656 312856 0 0 0 0 154 197 0 0 100
0

(Restarted it here)

3 0 5912 271392 40824 312996 0 0 320 24 1831 662 15 5 79
0
2 0 5912 222604 40824 313020 32 0 32 212 3968 1226 45 10 45
0
2 0 5912 243700 40824 313052 0 0 0 0 2571 1252 23 5 71
0

This is on a box which is basically idle - no indexing or searching
activity. CPU usage is very low, about 1%. System load is high, around 1.1
(on a 4-core VM). The VM above is a small test system, only about 50 docs in
ES in one index.

On the VM, ES has a tiny 128MB heap space allocated - on the physical
machine, each ES has 6GB (16GB RAM in the machine). Other than the heap
space and http/transport ports, all other settings are default.

The physical box exhibits similar characteristics, though, with 31m docs
taking about 75GB across 16 indices.

In each case, I'm running 2 nodes on one machine (just a test setup really,
to check clustering - prod will be multiple machines) - although knocking
out a node only halves the context switches, it doesn't reduce it to 'idle'
levels of a couple of hundred.

Performance seems fine, happily. But does this ring any alarm bells for
people? Anything I should be worried about? Do other people see this? Or
should I just not care?

Here's some system info from the VM:

root@linode-3:/var/log/supervisor# java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.3) (6b24-1.11.3-1ubuntu0.12.04.1)
OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)
root@linode-3:/var/log/supervisor# uname -a
Linux linode-3.secondsync.com 3.4.2-linode44 #1 SMP Tue Jun 12 15:04:46 EDT
2012 i686 i686 i386 GNU/Linux

... and from the physical machine:

root@server-2:~# java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.3) (6b24-1.11.3-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
root@server-2:~# uname -a
Linux server-2.secondsync.com 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25
08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com

--

--

Hi Maciej,

I just had very similar problem on newly installed ES (latest version
0.19.9) - was burning all the CPU, and strace was showing a lot of
futex() calls. Turned out to be a leap second insertion between June
and July (server was running since, but had no JVM till today).

As someone suggested here few posts ago:

/etc/init.d/ntp stop
date -s "date"
/etc/init.d/ntp start

Thanks for that. I'm pretty sure that this isn't the leap second thing though, as I had that a while ago. The difference in my case is that CPU utilisation is very low - it's the interrupt/context switch counts that are high.

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | www.fezconsulting.com

--

Dan,

Is this causing any issues, though?

Otis

Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Friday, August 24, 2012 8:08:16 AM UTC-4, Dan Fairs wrote:

Hi Maciej,

I just had very similar problem on newly installed ES (latest version
0.19.9) - was burning all the CPU, and strace was showing a lot of
futex() calls. Turned out to be a leap second insertion between June
and July (server was running since, but had no JVM till today).

As someone suggested here few posts ago:

/etc/init.d/ntp stop
date -s "date"
/etc/init.d/ntp start

Thanks for that. I'm pretty sure that this isn't the leap second thing
though, as I had that a while ago. The difference in my case is that CPU
utilisation is very low - it's the interrupt/context switch counts that are
high.

Cheers,
Dan

Dan Fairs | dan....@gmail.com <javascript:> | @danfairs |
www.fezconsulting.com

--

We're actually also seeing this happening as well. Only in our case, it's
not working very well :-/ Indexing is definitely working, but our context
switches hover around 1k switches / second, and when running queries it can
jump as high as 3k switches / second.

Using Java 7, ES 0.20.1, 27 x m1.xlarge-node cluster

We even tried changing the threadpool configuration for search (and to be
certain, search, index, bulk and refresh) to be (we wanted to make the size
explicitly twice the number of codes):

threadpool:
[bulk,search,index,refresh]:
type: blocking
min: 1
size: 8
wait_time: 30s

On Sunday, August 26, 2012 1:46:37 PM UTC-7, Otis Gospodnetic wrote:

Dan,

Is this causing any issues, though?

Otis

Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Friday, August 24, 2012 8:08:16 AM UTC-4, Dan Fairs wrote:

Hi Maciej,

I just had very similar problem on newly installed ES (latest version
0.19.9) - was burning all the CPU, and strace was showing a lot of
futex() calls. Turned out to be a leap second insertion between June
and July (server was running since, but had no JVM till today).

As someone suggested here few posts ago:

/etc/init.d/ntp stop
date -s "date"
/etc/init.d/ntp start

Thanks for that. I'm pretty sure that this isn't the leap second thing
though, as I had that a while ago. The difference in my case is that CPU
utilisation is very low - it's the interrupt/context switch counts that are
high.

Cheers,
Dan

Dan Fairs | dan....@gmail.com | @danfairs | www.fezconsulting.com

--

Hi Dan,

high context switching is mostly due to the method netty works. The issue
had been dealt with in https://github.com/netty/netty/pull/613 back in
August.

I also observe numbers around 1k to 3k with the default settings under
Linux. This should be considered ok as long as your system CPU doesn't use
more than 15-20% and your system load is low.

Jörg

--

We also see elevated read iops when we're having issues, roughly
commensurate with the increase in context switches, so in retrospect,
they're almost surely the sole reason why. We do have a strong correlation
between intervals of extreme CPU consumption (90-100%), and the elevated
context switches, but when rebooting the cluster, we see even higher
context switches and reasonable CPU consumption.

At this point, I suspect our issue lies somewhere else.

On Wednesday, December 19, 2012 5:51:22 PM UTC-8, Jörg Prante wrote:

Hi Dan,

high context switching is mostly due to the method netty works. The issue
had been dealt with in Use a TimerTask to trigger handling of timeouts, so we can raise the sel... by normanmaurer · Pull Request #613 · netty/netty · GitHub back in
August.

I also observe numbers around 1k to 3k with the default settings under
Linux. This should be considered ok as long as your system CPU doesn't use
more than 15-20% and your system load is low.

Jörg

--

Hi Dan,

high context switching is mostly due to the method netty works. The issue had been dealt with in Use a TimerTask to trigger handling of timeouts, so we can raise the sel... by normanmaurer · Pull Request #613 · netty/netty · GitHub back in August.

I also observe numbers around 1k to 3k with the default settings under Linux. This should be considered ok as long as your system CPU doesn't use more than 15-20% and your system load is low.

So sorry to be late back to this - I only just noticed the reply.

No, the high load doesn't actually seem to cause any problems - let's just say it made my spidey senses tingle!

The notes in the netty ticket say it's fixed in 3.5.8 - ES's pom.xml for 0.19.8 indicates it uses 3.5.1. When we do an ES upgrade, I'll keep an eye out to see if this has been improved.

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--