Failed start of 2nd instance on same host with mlockall=true

R_Toma · August 26, 2014, 12:54pm

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

servers has 24 cores, 256GB ram
each instance binds on different (alias) ip
each instance has 32GB heap
both instances run under user 'elastic'
limits for 'elastic' user: memlock=unlimited
es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

increase of overal cpu load
lots of I/O to disks
no logging for 2nd instance
2nd instance hangs
1st instance keeps running, but gets slowish
cd /proc/ causes a hang of cd process (until 2nd instance is killed)
exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · August 26, 2014, 3:10pm

You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
memlocking is no longer possible and fails.

Jörg

On Tue, Aug 26, 2014 at 2:54 PM, R. Toma renzo.toma@gmail.com wrote:

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

servers has 24 cores, 256GB ram

each instance binds on different (alias) ip

each instance has 32GB heap

both instances run under user 'elastic'

limits for 'elastic' user: memlock=unlimited

es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

increase of overal cpu load

lots of I/O to disks

no logging for 2nd instance

2nd instance hangs

1st instance keeps running, but gets slowish

cd /proc/ causes a hang of cd process (until 2nd instance is killed)

exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvtj3NKTWyMTjTre1FfJS31Khn%3DDAy_kCxgVcCFpmDSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

R_Toma · August 27, 2014, 7:53am

Found the following in the dmesg. Maybe I've hit a bug?

INFO: task java:18056 blocked for more than 120 seconds.
Not tainted 2.6.32-431.3.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000002 0 18056 1 0x00000080
ffff883fe016fdc8 0000000000000082 0000000000000000 ffff883fe016fde8
ffff883fe016fd88 ffffffff8111f3f0 ffff881a89bc25d8 ffff883fe016fde8
ffff883edb4025f8 ffff883fe016ffd8 000000000000fbc8 ffff883edb4025f8
Call Trace:
[] ? find_get_pages_tag+0x40/0x130
[] ? prepare_to_wait+0x4e/0x80
[] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[] ? autoremove_wake_function+0x0/0x40
[] ? do_writepages+0x21/0x40
[] jbd2_complete_transaction+0x68/0xb0 [jbd2]
[] ext4_sync_file+0x121/0x1d0 [ext4]
[] vfs_fsync_range+0xa1/0x100
[] vfs_fsync+0x1d/0x20
[] do_fsync+0x3e/0x60
[] sys_fsync+0x10/0x20
[] system_call_fastpath+0x16/0x1b

Op dinsdag 26 augustus 2014 14:54:50 UTC+2 schreef R. Toma:

Hi all,

In an attempt to squeeze more power out of our physical servers we want to
run multiple ES jvm's per server.

Some specs:

servers has 24 cores, 256GB ram

each instance binds on different (alias) ip

each instance has 32GB heap

both instances run under user 'elastic'

limits for 'elastic' user: memlock=unlimited

es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

increase of overal cpu load

lots of I/O to disks

no logging for 2nd instance

2nd instance hangs

1st instance keeps running, but gets slowish

cd /proc/ causes a hang of cd process (until 2nd instance is killed)

exec 'ps axuw' causes a hang of ps process (until 2nd instance is killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fc4566e-631d-49f7-b012-3d1c2270102f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

R_Toma · August 27, 2014, 7:59am

Hi Jörg,

Running just 1 JVM with 32GB on a 24-core 256GB machine is a waste. CPU,
I/O, memory metrics substantiate this. And off course we need to explore
multi-instance before asking mgmt for more money.

Regarding memlock: if no contiguous RAM is available I'd expect a fast
error and not a totally hanging process and call traces (mentioning a 120
second timeouts) in the dmesg. Do you think this is maybe a jvm or
elasticsearch bug? If so, i'll file it.

Regards,
Renzo

Op dinsdag 26 augustus 2014 17:10:56 UTC+2 schreef Jörg Prante:

You should run one node per host.

Two nodes add overhead and suffer from the effects you described.

For mlockall, the user needs privilege to allocate the specified locked
mem, and the OS need contiguous RAM per mlockall call. If the user's
memlock limit is exhausted, or if RAM allocation gets fragmented,
memlocking is no longer possible and fails.

Jörg

On Tue, Aug 26, 2014 at 2:54 PM, R. Toma <renzo...@gmail.com <javascript:>

wrote:

Hi all,

In an attempt to squeeze more power out of our physical servers we want
to run multiple ES jvm's per server.

Some specs:

servers has 24 cores, 256GB ram

each instance binds on different (alias) ip

each instance has 32GB heap

both instances run under user 'elastic'

limits for 'elastic' user: memlock=unlimited

es config for both instances: bootstrap.mlockall=true

The 1st instance has been running for weeks.

When starting the 2nd instance the following things happen:

increase of overal cpu load

lots of I/O to disks

no logging for 2nd instance

2nd instance hangs

1st instance keeps running, but gets slowish

cd /proc/ causes a hang of cd process (until 2nd instance is
killed)

exec 'ps axuw' causes a hang of ps process (until 2nd instance is
killed)

Maybe (un)related: I have never been able to run Elasticsearch in a
virtualbox with memlock=unlimited and mlockall=true.

After an hour of trial & errors I found that removing setting
'bootstrap.mlockall' (setting it to false) from 2nd instance's
configuration fixes things.

I am confused, but acknowledge I do not know anything about memlocking.

Any ideas?

Regards,
Renzo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b5e4770a-4194-48c9-aec4-4919dc53342a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbe8fb95-2054-45ac-a07a-5bf2955e8869%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch with bootstrap.mlockall causes server crash Elasticsearch	9	983	July 6, 2017
Help getting mlockall to work on ubuntu 14.04 Elasticsearch	4	6800	July 5, 2017
Mlockall vs ES_HEAP_SIZE Elasticsearch	2	620	July 5, 2017
Memlock check is incorrect in v2.3.1 Elasticsearch	1	1045	December 5, 2017
MIN/MAX memory allocation and mlockall Elasticsearch	4	676	July 6, 2017

Failed start of 2nd instance on same host with mlockall=true

Related topics