Elasticsearch with bootstrap.mlockall causes server crash

Hello,

I hope someone can assist with this -- whenever I start Elasticsearch with
the bootstrap.mlockall option enabled my entire server stops responding
after a minute or two.

There is nothing obvious in the log file, even with DEBUG enabled. I can't
find any errors in my system logs either, so I'm at a bit of a loss as to
what the issue is.

When I disable the bootstrap.mlockall option ES runs fine.

JRE -- jre1.7.0_51
ES -- elasticsearch-0.90.11
OS -- SLES11SP3 (VMware)
RAM -- 8GB
PAGING -- 5GB
ES_HEAP_SIZE -- 4GB (50% actual)

I'm building the first of a 3-node cluster.
I intended to get everything working here, then clone the VM to create the
remaining 2 nodes.

The "elastic" user has the following ulimits:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63931
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 6965324
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 63931
virtual memory (kbytes, -v) 6960400
file locks (-x) unlimited

Here is a DEBUG log from startup to hang (at which point I have to
hard-reset the server):

[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
initializing ...
[2014-02-07 00:49:44,464][DEBUG][node ] [log-els01]
using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data
[[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work
[/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
[2014-02-07 00:49:44,472][INFO ][plugins ] [log-els01]
loaded [], sites []
[2014-02-07 00:49:44,483][DEBUG][common.compress.lzf ] using
[UnsafeChunkDecoder] decoder
[2014-02-07 00:49:44,489][DEBUG][env ] [log-els01]
using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]],
local_node_id [0]
[2014-02-07 00:49:45,587][DEBUG][threadpool ] [log-els01]
creating thread_pool [generic], type [cached], keep_alive [30s]
[2014-02-07 00:49:45,595][DEBUG][threadpool ] [log-els01]
creating thread_pool [index], type [fixed], size [1], queue_size [200]
[2014-02-07 00:49:45,597][DEBUG][threadpool ] [log-els01]
creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [get], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [search], type [fixed], size [3], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [management], type [scaling], min [1], size [5],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [refresh], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [warmer], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [snapshot], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,600][DEBUG][threadpool ] [log-els01]
creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
[2014-02-07 00:49:45,638][DEBUG][transport.netty ] [log-els01]
using worker_count[2], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
[2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast] [log-els01]
using group [224.2.2.4], with port [54328], ttl [3], and address [null]
[2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01]
using initial hosts [], with concurrent_connects [10]
[2014-02-07 00:49:45,668][DEBUG][discovery.zen ] [log-els01]
using ping.timeout [3s], master_election.filter_client [true],
master_election.filter_data [false]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect ] [log-els01]
using minimum_master_nodes [2]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd ] [log-els01]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd ] [log-els01]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,704][DEBUG][monitor.jvm ] [log-els01]
enabled [true], last_gc_enabled [false], interval [1s], gc_threshold
[{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000,
debugThreshold=2000}, default=GcThreshold{name='default',
warnThreshold=10000, infoThreshold=5000, debugThreshold=2000},
young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700,
debugThreshold=400}}]
[2014-02-07 00:49:46,212][DEBUG][monitor.os ] [log-els01]
Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with
refresh_interval [1s]
[2014-02-07 00:49:46,217][DEBUG][monitor.process ] [log-els01]
Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be]
with refresh_interval [1s]
[2014-02-07 00:49:46,223][DEBUG][monitor.jvm ] [log-els01]
Using refresh_interval [1s]
[2014-02-07 00:49:46,228][DEBUG][monitor.network ] [log-els01]
Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc]
with refresh_interval [5s]
[2014-02-07 00:49:46,235][DEBUG][monitor.network ] [log-els01]
net_infofailed to get Network Interface Info [log-els01: log-els01: Name or
service not known]
[2014-02-07 00:49:46,238][DEBUG][monitor.fs ] [log-els01]
Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with
refresh_interval [1s]
[2014-02-07 00:49:46,570][DEBUG][indices.store ] [log-els01]
using indices.store.throttle.type [MERGE], with
index.store.throttle.max_bytes_per_sec [20mb]
[2014-02-07 00:49:46,578][DEBUG][cache.memory ] [log-els01]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
[2014-02-07 00:49:46,583][DEBUG][script ] [log-els01]
using script cache with max_size [500], expire [null]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,591][DEBUG][gateway.local ] [log-els01]
using initial_shards [quorum], list_timeout [30s]
[2014-02-07 00:49:46,735][DEBUG][indices.recovery ] [log-els01]
using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size
[512kb], translog_size [512kb], translog_ops [1000], and compress [true]
[2014-02-07 00:49:46,801][DEBUG][http.netty ] [log-els01]
using max_chunk_size[8kb], max_header_size[8kb],
max_initial_line_length[4kb], max_content_length[100mb],
receive_predictor[512kb->512kb]
[2014-02-07 00:49:46,805][DEBUG][indices.memory ] [log-els01]
using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb],
max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
[2014-02-07 00:49:46,806][DEBUG][indices.cache.filter ] [log-els01]
using [node] weighted filter cache with size [20%], actual_size [817.5mb],
expire [null], clean_interval [1m]
[2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache ] [log-els01]
using size [-1] [-1b], expire [null]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
using gateway.local.auto_import_dangled [YES], with
gateway.local.dangling_timeout [2h]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
took 0s to load state
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01]
took 0s to load started shards state
[2014-02-07 00:49:46,824][DEBUG][bulk.udp ] [log-els01]
using enabled [false], host [null], port [9700-9800], bulk_actions [1000],
bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
initialized
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
starting ...
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-02-07 00:49:46,890][DEBUG][transport.netty ] [log-els01]
Bound to address [/0:0:0:0:0:0:0:0:9300]
[2014-02-07 00:49:46,893][INFO ][transport ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/172.22.123.224:9300]}
[2014-02-07 00:49:49,912][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:52,913][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:55,914][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:58,915][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:01,916][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:04,918][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:07,919][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:10,920][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:13,921][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:16,900][WARN ][discovery ] [log-els01]
waited for 30s and no initial state was set by the discovery
[2014-02-07 00:50:16,900][INFO ][discovery ] [log-els01]
logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
[2014-02-07 00:50:16,900][DEBUG][gateway ] [log-els01]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-02-07 00:50:16,907][INFO ][http ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/172.22.123.224:9200]}
[2014-02-07 00:50:16,908][INFO ][node ] [log-els01]
started
[2014-02-07 00:50:16,922][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:19,929][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:22,931][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:25,962][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:28,963][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:31,965][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:35,003][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:38,025][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:41,048][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:44,051][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:47,053][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}

Am I missing something obvious? Can I do anything to get more debug
information?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68ed18b2-805f-43ce-bb0b-c4259c7004ef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What about your system logs? Things under /var/log like system/dmesg/daemon
etc.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 7 February 2014 23:47, KingBob toby.gale@gmail.com wrote:

Hello,

I hope someone can assist with this -- whenever I start Elasticsearch with
the bootstrap.mlockall option enabled my entire server stops responding
after a minute or two.

There is nothing obvious in the log file, even with DEBUG enabled. I
can't find any errors in my system logs either, so I'm at a bit of a loss
as to what the issue is.

When I disable the bootstrap.mlockall option ES runs fine.

JRE -- jre1.7.0_51
ES -- elasticsearch-0.90.11
OS -- SLES11SP3 (VMware)
RAM -- 8GB
PAGING -- 5GB
ES_HEAP_SIZE -- 4GB (50% actual)

I'm building the first of a 3-node cluster.
I intended to get everything working here, then clone the VM to create the
remaining 2 nodes.

The "elastic" user has the following ulimits:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63931
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 6965324
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 63931
virtual memory (kbytes, -v) 6960400
file locks (-x) unlimited

Here is a DEBUG log from startup to hang (at which point I have to
hard-reset the server):

[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
initializing ...
[2014-02-07 00:49:44,464][DEBUG][node ] [log-els01]
using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data
[[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work
[/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
[2014-02-07 00:49:44,472][INFO ][plugins ] [log-els01]
loaded , sites
[2014-02-07 00:49:44,483][DEBUG][common.compress.lzf ] using
[UnsafeChunkDecoder] decoder
[2014-02-07 00:49:44,489][DEBUG][env ] [log-els01]
using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]],
local_node_id [0]
[2014-02-07 00:49:45,587][DEBUG][threadpool ] [log-els01]
creating thread_pool [generic], type [cached], keep_alive [30s]
[2014-02-07 00:49:45,595][DEBUG][threadpool ] [log-els01]
creating thread_pool [index], type [fixed], size [1], queue_size [200]
[2014-02-07 00:49:45,597][DEBUG][threadpool ] [log-els01]
creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [get], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [search], type [fixed], size [3], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [management], type [scaling], min [1], size [5],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [refresh], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [warmer], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [snapshot], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,600][DEBUG][threadpool ] [log-els01]
creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
[2014-02-07 00:49:45,638][DEBUG][transport.netty ] [log-els01]
using worker_count[2], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
[2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast] [log-els01]
using group [224.2.2.4], with port [54328], ttl [3], and address [null]
[2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01]
using initial hosts , with concurrent_connects [10]
[2014-02-07 00:49:45,668][DEBUG][discovery.zen ] [log-els01]
using ping.timeout [3s], master_election.filter_client [true],
master_election.filter_data [false]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect ] [log-els01]
using minimum_master_nodes [2]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd ] [log-els01]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd ] [log-els01]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,704][DEBUG][monitor.jvm ] [log-els01]
enabled [true], last_gc_enabled [false], interval [1s], gc_threshold
[{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000,
debugThreshold=2000}, default=GcThreshold{name='default',
warnThreshold=10000, infoThreshold=5000, debugThreshold=2000},
young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700,
debugThreshold=400}}]
[2014-02-07 00:49:46,212][DEBUG][monitor.os ] [log-els01]
Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with
refresh_interval [1s]
[2014-02-07 00:49:46,217][DEBUG][monitor.process ] [log-els01]
Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be]
with refresh_interval [1s]
[2014-02-07 00:49:46,223][DEBUG][monitor.jvm ] [log-els01]
Using refresh_interval [1s]
[2014-02-07 00:49:46,228][DEBUG][monitor.network ] [log-els01]
Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc]
with refresh_interval [5s]
[2014-02-07 00:49:46,235][DEBUG][monitor.network ] [log-els01]
net_infofailed to get Network Interface Info [log-els01: log-els01: Name or
service not known]
[2014-02-07 00:49:46,238][DEBUG][monitor.fs ] [log-els01]
Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with
refresh_interval [1s]
[2014-02-07 00:49:46,570][DEBUG][indices.store ] [log-els01]
using indices.store.throttle.type [MERGE], with
index.store.throttle.max_bytes_per_sec [20mb]
[2014-02-07 00:49:46,578][DEBUG][cache.memory ] [log-els01]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
[2014-02-07 00:49:46,583][DEBUG][script ] [log-els01]
using script cache with max_size [500], expire [null]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,591][DEBUG][gateway.local ] [log-els01]
using initial_shards [quorum], list_timeout [30s]
[2014-02-07 00:49:46,735][DEBUG][indices.recovery ] [log-els01]
using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size
[512kb], translog_size [512kb], translog_ops [1000], and compress [true]
[2014-02-07 00:49:46,801][DEBUG][http.netty ] [log-els01]
using max_chunk_size[8kb], max_header_size[8kb],
max_initial_line_length[4kb], max_content_length[100mb],
receive_predictor[512kb->512kb]
[2014-02-07 00:49:46,805][DEBUG][indices.memory ] [log-els01]
using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb],
max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
[2014-02-07 00:49:46,806][DEBUG][indices.cache.filter ] [log-els01]
using [node] weighted filter cache with size [20%], actual_size [817.5mb],
expire [null], clean_interval [1m]
[2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache ] [log-els01]
using size [-1] [-1b], expire [null]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
using gateway.local.auto_import_dangled [YES], with
gateway.local.dangling_timeout [2h]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
took 0s to load state
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01]
took 0s to load started shards state
[2014-02-07 00:49:46,824][DEBUG][bulk.udp ] [log-els01]
using enabled [false], host [null], port [9700-9800], bulk_actions [1000],
bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
initialized
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
starting ...
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-02-07 00:49:46,890][DEBUG][transport.netty ] [log-els01]
Bound to address [/0:0:0:0:0:0:0:0:9300]
[2014-02-07 00:49:46,893][INFO ][transport ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
172.22.123.224:9300]}
[2014-02-07 00:49:49,912][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:52,913][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:55,914][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:58,915][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:01,916][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:04,918][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:07,919][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:10,920][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:13,921][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:16,900][WARN ][discovery ] [log-els01]
waited for 30s and no initial state was set by the discovery
[2014-02-07 00:50:16,900][INFO ][discovery ] [log-els01]
logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
[2014-02-07 00:50:16,900][DEBUG][gateway ] [log-els01]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-02-07 00:50:16,907][INFO ][http ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
172.22.123.224:9200]}
[2014-02-07 00:50:16,908][INFO ][node ] [log-els01]
started
[2014-02-07 00:50:16,922][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:19,929][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:22,931][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:25,962][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:28,963][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:31,965][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:35,003][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:38,025][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:41,048][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:44,051][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:47,053][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}

Am I missing something obvious? Can I do anything to get more debug
information?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/68ed18b2-805f-43ce-bb0b-c4259c7004ef%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZJbf5oVTVTcUk5gDA3WsyjPLcw0UEsMo_jZB%3D83shsPw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

You write "your entire server stops responding", also you write "OS:
SLES11SP3 (VMware)"

This is not clear to me.

What is not responding? Is SuSE Linux your host or your guest OS? Is VMware
not responding? Is SuSE Linux not responding? Or is ES not responding? What
VMware product/version is this?

I assume you run ES on a SLES11SP3 host, not guest. If so, I recommend to
keep things simple for testing, and try an mlockall ES on the bare server
OS. mlockall makes only sense in a host environment or on a HW-virtualized
environment.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEYtRxg5_Qu%3DYo4wXQJYOzSN0G4NFKO7n84zYnEihwJdg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Without answering your post directly,
I've also been doing some experimenting with a 5 node cluster built on openSUSE (your close cousin).

One of my experiments has been to restart the cluster decreasing memory capacities and surprisingly I was able to squeeze memory available to where substantial performance latency was observed but at no time was the swap touched ( except one time when only a couple megabytes was swapped out of the 2+GB total on a couple nodes).

And this was with completely default JVM settings (including no mlockall setting).

All activity was related to indexing and shard allocation, no heavy queries were performed.

Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c02b6375-51cb-49f1-9d6b-5fc9390da3d9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jorg

I understand the Host (ie bare metal) but not sure what you might mean by a HW virtualized environment.

If not a Guest environment, something else?

Thx,
Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b787b5e7-4de6-4d23-b67e-2b2387ad1a10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can have para-virtualized VM or HW-assisted virtualized VM

http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf

Only with HW support by the CPU, you can use mlockall successfully also on
a VM.

Jörg

On Sun, Feb 9, 2014 at 9:06 PM, Tony Su tonysu999@gmail.com wrote:

Hi Jorg

I understand the Host (ie bare metal) but not sure what you might mean by
a HW virtualized environment.

If not a Guest environment, something else?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGOcmh3g5Z8E-48D1GUQmvECSoPgKbQeSsX9bKrnFnSww%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok,
I see you mean paravirtualization.

Re virtualization technologies, AFAIK the possibilities are

Paravirtualization, eg VMwate, Xen, KVM Virtualbox
Full emulation, eg QEMU
Software emulation, eg old VMware technology, nowadays 32-bit virtualization.

So, I'm guessing all of the above except for the last, plus installing on bare metal.

Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b490015-f582-49ab-840f-b6f88a5e2d54%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

Thanks to everyone for the replies.

Apologies if there was ambiguity in my original post, allow me to clarify a
few things:

VMware: VMware vSphere 5.1 (ESXi-5.1.0-20121004001-standard)

It was the entire virtual machine that stopped responding (not VMware or
the physical server ESXi runs on).

Due to the fact that SLES (on the VM) was locking up entirely, it wasn't
writing anything useful to the system logs, they just stopped mid-stream.

As it happens, I have resolved the issue -- I needed to add a reservation
policy for the Memory to prevent VMware's Balloon Driver messing with the
contents of my VMs memory.

Right Click VM -> Edit Settings -> Resources tab -> Memory -> enable
"Reserve all guest memory (All locked)"

Hope this helps someone else out there :slight_smile:

Many thanks,
KB.

On Friday, February 7, 2014 12:47:08 PM UTC, KingBob wrote:

Hello,

I hope someone can assist with this -- whenever I start Elasticsearch with
the bootstrap.mlockall option enabled my entire server stops responding
after a minute or two.

There is nothing obvious in the log file, even with DEBUG enabled. I
can't find any errors in my system logs either, so I'm at a bit of a loss
as to what the issue is.

When I disable the bootstrap.mlockall option ES runs fine.

JRE -- jre1.7.0_51
ES -- elasticsearch-0.90.11
OS -- SLES11SP3 (VMware)
RAM -- 8GB
PAGING -- 5GB
ES_HEAP_SIZE -- 4GB (50% actual)

I'm building the first of a 3-node cluster.
I intended to get everything working here, then clone the VM to create the
remaining 2 nodes.

The "elastic" user has the following ulimits:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63931
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 6965324
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 63931
virtual memory (kbytes, -v) 6960400
file locks (-x) unlimited

Here is a DEBUG log from startup to hang (at which point I have to
hard-reset the server):

[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
initializing ...
[2014-02-07 00:49:44,464][DEBUG][node ] [log-els01]
using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data
[[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work
[/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
[2014-02-07 00:49:44,472][INFO ][plugins ] [log-els01]
loaded , sites
[2014-02-07 00:49:44,483][DEBUG][common.compress.lzf ] using
[UnsafeChunkDecoder] decoder
[2014-02-07 00:49:44,489][DEBUG][env ] [log-els01]
using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]],
local_node_id [0]
[2014-02-07 00:49:45,587][DEBUG][threadpool ] [log-els01]
creating thread_pool [generic], type [cached], keep_alive [30s]
[2014-02-07 00:49:45,595][DEBUG][threadpool ] [log-els01]
creating thread_pool [index], type [fixed], size [1], queue_size [200]
[2014-02-07 00:49:45,597][DEBUG][threadpool ] [log-els01]
creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [get], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [search], type [fixed], size [3], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [management], type [scaling], min [1], size [5],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [refresh], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [warmer], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [snapshot], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,600][DEBUG][threadpool ] [log-els01]
creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
[2014-02-07 00:49:45,638][DEBUG][transport.netty ] [log-els01]
using worker_count[2], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
[2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast] [log-els01]
using group [224.2.2.4], with port [54328], ttl [3], and address [null]
[2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01]
using initial hosts , with concurrent_connects [10]
[2014-02-07 00:49:45,668][DEBUG][discovery.zen ] [log-els01]
using ping.timeout [3s], master_election.filter_client [true],
master_election.filter_data [false]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect ] [log-els01]
using minimum_master_nodes [2]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd ] [log-els01]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd ] [log-els01]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,704][DEBUG][monitor.jvm ] [log-els01]
enabled [true], last_gc_enabled [false], interval [1s], gc_threshold
[{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000,
debugThreshold=2000}, default=GcThreshold{name='default',
warnThreshold=10000, infoThreshold=5000, debugThreshold=2000},
young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700,
debugThreshold=400}}]
[2014-02-07 00:49:46,212][DEBUG][monitor.os ] [log-els01]
Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with
refresh_interval [1s]
[2014-02-07 00:49:46,217][DEBUG][monitor.process ] [log-els01]
Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be]
with refresh_interval [1s]
[2014-02-07 00:49:46,223][DEBUG][monitor.jvm ] [log-els01]
Using refresh_interval [1s]
[2014-02-07 00:49:46,228][DEBUG][monitor.network ] [log-els01]
Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc]
with refresh_interval [5s]
[2014-02-07 00:49:46,235][DEBUG][monitor.network ] [log-els01]
net_infofailed to get Network Interface Info [log-els01: log-els01: Name or
service not known]
[2014-02-07 00:49:46,238][DEBUG][monitor.fs ] [log-els01]
Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with
refresh_interval [1s]
[2014-02-07 00:49:46,570][DEBUG][indices.store ] [log-els01]
using indices.store.throttle.type [MERGE], with
index.store.throttle.max_bytes_per_sec [20mb]
[2014-02-07 00:49:46,578][DEBUG][cache.memory ] [log-els01]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
[2014-02-07 00:49:46,583][DEBUG][script ] [log-els01]
using script cache with max_size [500], expire [null]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,591][DEBUG][gateway.local ] [log-els01]
using initial_shards [quorum], list_timeout [30s]
[2014-02-07 00:49:46,735][DEBUG][indices.recovery ] [log-els01]
using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size
[512kb], translog_size [512kb], translog_ops [1000], and compress [true]
[2014-02-07 00:49:46,801][DEBUG][http.netty ] [log-els01]
using max_chunk_size[8kb], max_header_size[8kb],
max_initial_line_length[4kb], max_content_length[100mb],
receive_predictor[512kb->512kb]
[2014-02-07 00:49:46,805][DEBUG][indices.memory ] [log-els01]
using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb],
max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
[2014-02-07 00:49:46,806][DEBUG][indices.cache.filter ] [log-els01]
using [node] weighted filter cache with size [20%], actual_size [817.5mb],
expire [null], clean_interval [1m]
[2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache ] [log-els01]
using size [-1] [-1b], expire [null]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
using gateway.local.auto_import_dangled [YES], with
gateway.local.dangling_timeout [2h]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
took 0s to load state
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01]
took 0s to load started shards state
[2014-02-07 00:49:46,824][DEBUG][bulk.udp ] [log-els01]
using enabled [false], host [null], port [9700-9800], bulk_actions [1000],
bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
initialized
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
starting ...
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-02-07 00:49:46,890][DEBUG][transport.netty ] [log-els01]
Bound to address [/0:0:0:0:0:0:0:0:9300]
[2014-02-07 00:49:46,893][INFO ][transport ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
172.22.123.224:9300]}
[2014-02-07 00:49:49,912][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:52,913][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:55,914][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:58,915][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:01,916][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:04,918][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:07,919][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:10,920][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:13,921][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:16,900][WARN ][discovery ] [log-els01]
waited for 30s and no initial state was set by the discovery
[2014-02-07 00:50:16,900][INFO ][discovery ] [log-els01]
logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
[2014-02-07 00:50:16,900][DEBUG][gateway ] [log-els01]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-02-07 00:50:16,907][INFO ][http ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
172.22.123.224:9200]}
[2014-02-07 00:50:16,908][INFO ][node ] [log-els01]
started
[2014-02-07 00:50:16,922][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:19,929][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:22,931][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:25,962][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:28,963][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:31,965][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:35,003][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:38,025][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:41,048][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:44,051][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:47,053][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}

Am I missing something obvious? Can I do anything to get more debug
information?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e87139d0-8403-4bc5-b7a6-33640a7ad38e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Interesting and something to keep in mind for all virtualization
deployments, the balloon memory setting is not unique to VMware.

But, I'd encourage you to do the same testing I did depending on your
objectives. I found the mlockall setting unnecessary (so far) if you're
trying to prevent swap, but still allow over-provisioning memory overall
(allow the virtualization technology to balance actual memory allocation
based on activity for all instances).

Considering AFAIK my observation is new, it may require some further
testing for verification.

Tony

On Monday, February 10, 2014 2:52:23 AM UTC-8, KingBob wrote:

Hello,

Thanks to everyone for the replies.

Apologies if there was ambiguity in my original post, allow me to clarify
a few things:

VMware: VMware vSphere 5.1 (ESXi-5.1.0-20121004001-standard)

It was the entire virtual machine that stopped responding (not VMware or
the physical server ESXi runs on).

Due to the fact that SLES (on the VM) was locking up entirely, it wasn't
writing anything useful to the system logs, they just stopped mid-stream.

As it happens, I have resolved the issue -- I needed to add a reservation
policy for the Memory to prevent VMware's Balloon Driver messing with the
contents of my VMs memory.

Right Click VM -> Edit Settings -> Resources tab -> Memory -> enable
"Reserve all guest memory (All locked)"

Hope this helps someone else out there :slight_smile:

Many thanks,
KB.

On Friday, February 7, 2014 12:47:08 PM UTC, KingBob wrote:

Hello,

I hope someone can assist with this -- whenever I start Elasticsearch
with the bootstrap.mlockall option enabled my entire server stops
responding after a minute or two.

There is nothing obvious in the log file, even with DEBUG enabled. I
can't find any errors in my system logs either, so I'm at a bit of a loss
as to what the issue is.

When I disable the bootstrap.mlockall option ES runs fine.

JRE -- jre1.7.0_51
ES -- elasticsearch-0.90.11
OS -- SLES11SP3 (VMware)
RAM -- 8GB
PAGING -- 5GB
ES_HEAP_SIZE -- 4GB (50% actual)

I'm building the first of a 3-node cluster.
I intended to get everything working here, then clone the VM to create
the remaining 2 nodes.

The "elastic" user has the following ulimits:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63931
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 6965324
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 63931
virtual memory (kbytes, -v) 6960400
file locks (-x) unlimited

Here is a DEBUG log from startup to hang (at which point I have to
hard-reset the server):

[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
version[0.90.11], pid[3894], build[11da1ba/2014-02-03T15:27:39Z]
[2014-02-07 00:49:44,463][INFO ][node ] [log-els01]
initializing ...
[2014-02-07 00:49:44,464][DEBUG][node ] [log-els01]
using home [/opt/elasticsearch], config [/opt/elasticsearch/config], data
[[/var/data/elasticsearch]], logs [/opt/elasticsearch/logs], work
[/opt/elasticsearch/work], plugins [/opt/elasticsearch/plugins]
[2014-02-07 00:49:44,472][INFO ][plugins ] [log-els01]
loaded , sites
[2014-02-07 00:49:44,483][DEBUG][common.compress.lzf ] using
[UnsafeChunkDecoder] decoder
[2014-02-07 00:49:44,489][DEBUG][env ] [log-els01]
using node location [[/var/data/elasticsearch/logstash-cluster/nodes/0]],
local_node_id [0]
[2014-02-07 00:49:45,587][DEBUG][threadpool ] [log-els01]
creating thread_pool [generic], type [cached], keep_alive [30s]
[2014-02-07 00:49:45,595][DEBUG][threadpool ] [log-els01]
creating thread_pool [index], type [fixed], size [1], queue_size [200]
[2014-02-07 00:49:45,597][DEBUG][threadpool ] [log-els01]
creating thread_pool [bulk], type [fixed], size [1], queue_size [50]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [get], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [search], type [fixed], size [3], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [suggest], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [percolate], type [fixed], size [1], queue_size [1k]
[2014-02-07 00:49:45,598][DEBUG][threadpool ] [log-els01]
creating thread_pool [management], type [scaling], min [1], size [5],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [flush], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [merge], type [scaling], min [1], size [1], keep_alive
[5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [refresh], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [warmer], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,599][DEBUG][threadpool ] [log-els01]
creating thread_pool [snapshot], type [scaling], min [1], size [1],
keep_alive [5m]
[2014-02-07 00:49:45,600][DEBUG][threadpool ] [log-els01]
creating thread_pool [optimize], type [fixed], size [1], queue_size [null]
[2014-02-07 00:49:45,638][DEBUG][transport.netty ] [log-els01]
using worker_count[2], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
[2014-02-07 00:49:45,664][DEBUG][discovery.zen.ping.multicast]
[log-els01] using group [224.2.2.4], with port [54328], ttl [3], and
address [null]
[2014-02-07 00:49:45,667][DEBUG][discovery.zen.ping.unicast] [log-els01]
using initial hosts , with concurrent_connects [10]
[2014-02-07 00:49:45,668][DEBUG][discovery.zen ] [log-els01]
using ping.timeout [3s], master_election.filter_client [true],
master_election.filter_data [false]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.elect ] [log-els01]
using minimum_master_nodes [2]
[2014-02-07 00:49:45,669][DEBUG][discovery.zen.fd ] [log-els01]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,678][DEBUG][discovery.zen.fd ] [log-els01]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2014-02-07 00:49:45,704][DEBUG][monitor.jvm ] [log-els01]
enabled [true], last_gc_enabled [false], interval [1s], gc_threshold
[{old=GcThreshold{name='old', warnThreshold=10000, infoThreshold=5000,
debugThreshold=2000}, default=GcThreshold{name='default',
warnThreshold=10000, infoThreshold=5000, debugThreshold=2000},
young=GcThreshold{name='young', warnThreshold=1000, infoThreshold=700,
debugThreshold=400}}]
[2014-02-07 00:49:46,212][DEBUG][monitor.os ] [log-els01]
Using probe [org.elasticsearch.monitor.os.SigarOsProbe@7139edf6] with
refresh_interval [1s]
[2014-02-07 00:49:46,217][DEBUG][monitor.process ] [log-els01]
Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@50a744be]
with refresh_interval [1s]
[2014-02-07 00:49:46,223][DEBUG][monitor.jvm ] [log-els01]
Using refresh_interval [1s]
[2014-02-07 00:49:46,228][DEBUG][monitor.network ] [log-els01]
Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4ae0a5dc]
with refresh_interval [5s]
[2014-02-07 00:49:46,235][DEBUG][monitor.network ] [log-els01]
net_infofailed to get Network Interface Info [log-els01: log-els01: Name or
service not known]
[2014-02-07 00:49:46,238][DEBUG][monitor.fs ] [log-els01]
Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@5aa22f2c] with
refresh_interval [1s]
[2014-02-07 00:49:46,570][DEBUG][indices.store ] [log-els01]
using indices.store.throttle.type [MERGE], with
index.store.throttle.max_bytes_per_sec [20mb]
[2014-02-07 00:49:46,578][DEBUG][cache.memory ] [log-els01]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
[2014-02-07 00:49:46,583][DEBUG][script ] [log-els01]
using script cache with max_size [500], expire [null]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,588][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,591][DEBUG][gateway.local ] [log-els01]
using initial_shards [quorum], list_timeout [30s]
[2014-02-07 00:49:46,735][DEBUG][indices.recovery ] [log-els01]
using max_bytes_per_sec[20mb], concurrent_streams [3], file_chunk_size
[512kb], translog_size [512kb], translog_ops [1000], and compress [true]
[2014-02-07 00:49:46,801][DEBUG][http.netty ] [log-els01]
using max_chunk_size[8kb], max_header_size[8kb],
max_initial_line_length[4kb], max_content_length[100mb],
receive_predictor[512kb->512kb]
[2014-02-07 00:49:46,805][DEBUG][indices.memory ] [log-els01]
using index_buffer_size [408.7mb], with min_shard_index_buffer_size [4mb],
max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
[2014-02-07 00:49:46,806][DEBUG][indices.cache.filter ] [log-els01]
using [node] weighted filter cache with size [20%], actual_size [817.5mb],
expire [null], clean_interval [1m]
[2014-02-07 00:49:46,807][DEBUG][indices.fielddata.cache ] [log-els01]
using size [-1] [-1b], expire [null]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
using gateway.local.auto_import_dangled [YES], with
gateway.local.dangling_timeout [2h]
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.meta ] [log-els01]
took 0s to load state
[2014-02-07 00:49:46,819][DEBUG][gateway.local.state.shards] [log-els01]
took 0s to load started shards state
[2014-02-07 00:49:46,824][DEBUG][bulk.udp ] [log-els01]
using enabled [false], host [null], port [9700-9800], bulk_actions [1000],
bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,826][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster.routing.allocation.allow_rebalance] with
[indices_all_active]
[2014-02-07 00:49:46,827][DEBUG][cluster.routing.allocation.decider]
[log-els01] using [cluster_concurrent_rebalance] with [2]
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
initialized
[2014-02-07 00:49:46,836][INFO ][node ] [log-els01]
starting ...
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-02-07 00:49:46,850][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-02-07 00:49:46,890][DEBUG][transport.netty ] [log-els01]
Bound to address [/0:0:0:0:0:0:0:0:9300]
[2014-02-07 00:49:46,893][INFO ][transport ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
172.22.123.224:9300]}
[2014-02-07 00:49:49,912][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:52,913][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:55,914][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:49:58,915][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:01,916][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:04,918][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:07,919][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:10,920][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:13,921][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:16,900][WARN ][discovery ] [log-els01]
waited for 30s and no initial state was set by the discovery
[2014-02-07 00:50:16,900][INFO ][discovery ] [log-els01]
logstash-cluster/ZFdVtQwxQLuPCArpBV6bew
[2014-02-07 00:50:16,900][DEBUG][gateway ] [log-els01]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-02-07 00:50:16,907][INFO ][http ] [log-els01]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
172.22.123.224:9200]}
[2014-02-07 00:50:16,908][INFO ][node ] [log-els01]
started
[2014-02-07 00:50:16,922][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:19,929][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:22,931][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:25,962][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:28,963][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:31,965][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:35,003][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:38,025][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:41,048][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:44,051][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-02-07 00:50:47,053][DEBUG][discovery.zen ] [log-els01]
filtered ping responses: (filter_client[true], filter_data[false]) {none}

Am I missing something obvious? Can I do anything to get more debug
information?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07d61268-90ef-40ad-a83f-cdc427147390%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.