Failed start of 2nd instance on same host with mlockall=true


(R. Toma) #1

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

  • servers has 24 cores, 256GB ram
  • each instance binds on different (alias) ip
  • each instance has 32GB heap
  • both instances run under user 'elastic'
  • limits for 'elastic' user: memlock=unlimited
  • es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

  • increase of overal cpu load
  • lots of I/O to disks
  • no logging for 2nd instance
  • 2nd instance hangs
  • 1st instance keeps running, but gets slowish
  • cd /proc/ causes a hang of cd process (until 2nd instance is killed)
  • exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
memlocking is no longer possible and fails.

Jörg

On Tue, Aug 26, 2014 at 2:54 PM, R. Toma renzo.toma@gmail.com wrote:

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

  • servers has 24 cores, 256GB ram
  • each instance binds on different (alias) ip
  • each instance has 32GB heap
  • both instances run under user 'elastic'
  • limits for 'elastic' user: memlock=unlimited
  • es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

  • increase of overal cpu load
  • lots of I/O to disks
  • no logging for 2nd instance
  • 2nd instance hangs
  • 1st instance keeps running, but gets slowish
  • cd /proc/ causes a hang of cd process (until 2nd instance is killed)
  • exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvtj3NKTWyMTjTre1FfJS31Khn%3DDAy_kCxgVcCFpmDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(R. Toma) #3

Found the following in the dmesg. Maybe I've hit a bug?

INFO: task java:18056 blocked for more than 120 seconds.
Not tainted 2.6.32-431.3.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000002 0 18056 1 0x00000080
ffff883fe016fdc8 0000000000000082 0000000000000000 ffff883fe016fde8
ffff883fe016fd88 ffffffff8111f3f0 ffff881a89bc25d8 ffff883fe016fde8
ffff883edb4025f8 ffff883fe016ffd8 000000000000fbc8 ffff883edb4025f8
Call Trace:
[] ? find_get_pages_tag+0x40/0x130
[] ? prepare_to_wait+0x4e/0x80
[] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[] ? autoremove_wake_function+0x0/0x40
[] ? do_writepages+0x21/0x40
[] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[] ext4_sync_file+0x121/0x1d0 [ext4]
[] vfs_fsync_range+0xa1/0x100
[] vfs_fsync+0x1d/0x20
[] do_fsync+0x3e/0x60
[] sys_fsync+0x10/0x20
[] system_call_fastpath+0x16/0x1b

Op dinsdag 26 augustus 2014 14:54:50 UTC+2 schreef R. Toma:

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

  • servers has 24 cores, 256GB ram
  • each instance binds on different (alias) ip
  • each instance has 32GB heap
  • both instances run under user 'elastic'
  • limits for 'elastic' user: memlock=unlimited
  • es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

  • increase of overal cpu load
  • lots of I/O to disks
  • no logging for 2nd instance
  • 2nd instance hangs
  • 1st instance keeps running, but gets slowish
  • cd /proc/ causes a hang of cd process (until 2nd instance is killed)
  • exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fc4566e-631d-49f7-b012-3d1c2270102f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(R. Toma) #4

Hi Jörg,

Running just 1 JVM with 32GB on a 24-core 256GB machine is a waste. CPU,
I/O, memory metrics substantiate this. And off course we need to explore
multi-instance before asking mgmt for more money.

Regarding memlock: if no contiguous RAM is available I'd expect a fast
error and not a totally hanging process and call traces (mentioning a 120
second timeouts) in the dmesg. Do you think this is maybe a jvm or
elasticsearch bug? If so, i'll file it.

Regards,
Renzo

Op dinsdag 26 augustus 2014 17:10:56 UTC+2 schreef Jörg Prante:

You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
memlocking is no longer possible and fails.

Jörg

On Tue, Aug 26, 2014 at 2:54 PM, R. Toma <renzo...@gmail.com <javascript:>

wrote:

Hi all,

In an attempt to squeeze more power out of our physical servers we want
to run multiple ES jvm's per server.

Some specs:

  • servers has 24 cores, 256GB ram
  • each instance binds on different (alias) ip
  • each instance has 32GB heap
  • both instances run under user 'elastic'
  • limits for 'elastic' user: memlock=unlimited
  • es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

  • increase of overal cpu load
  • lots of I/O to disks
  • no logging for 2nd instance
  • 2nd instance hangs
  • 1st instance keeps running, but gets slowish
  • cd /proc/ causes a hang of cd process (until 2nd instance is
    killed)
  • exec 'ps axuw' causes a hang of ps process (until 2nd instance is
    killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbe8fb95-2054-45ac-a07a-5bf2955e8869%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5