ES 5.2.2 crashes daily in docker

Hey guys,

For some reason ES 5.2.2 crashes daily in docker for me. Here is how I launch it

docker run --name search -d --net=host -p 9200:9200 -p 9300:9300 -e "http.host=127.0.0.1" -e "transport.host=127.0.0.1" -v /jdata/elastic/data:/usr/share/elasticsearch/data -v /jdata/elastic/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml docker.elastic.co/elasticsearch/elasticsearch:5.2.2

my config doesn't have much

transport.tcp.port: 9300
http.port: 9200
client.transport.ignore_cluster_name: false
client.transport.sniff: false
discovery.zen.minimum_master_nodes: 1
xpack.security.enabled: false

and here what I have in log before it dies.

[2017-03-26T19:08:07,160][INFO ][o.e.n.Node ] initializing ...
[2017-03-26T19:08:07,317][INFO ][o.e.e.NodeEnvironment ] [1WEhN6j] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/xvdb)]], net usable_space [294.6gb], net total_space [492gb], spins? [possibly], types [ext3]
[2017-03-26T19:08:07,318][INFO ][o.e.e.NodeEnvironment ] [1WEhN6j] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-03-26T19:08:07,368][INFO ][o.e.n.Node ] node name [1WEhN6j] derived from node ID [1WEhN6juR--PB5kOwCTJQA]; set [node.name] to override
[2017-03-26T19:08:07,374][INFO ][o.e.n.Node ] version[5.2.2], pid[1], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/4.4.19-29.55.amzn1.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_92-internal/25.92-b14]
[2017-03-26T19:08:12,275][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [aggs-matrix-stats]
[2017-03-26T19:08:12,275][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [ingest-common]
[2017-03-26T19:08:12,276][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [lang-expression]
[2017-03-26T19:08:12,276][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [lang-groovy]
[2017-03-26T19:08:12,276][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [lang-mustache]
[2017-03-26T19:08:12,276][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [lang-painless]
[2017-03-26T19:08:12,277][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [percolator]
[2017-03-26T19:08:12,277][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [reindex]
[2017-03-26T19:08:12,277][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [transport-netty3]
[2017-03-26T19:08:12,278][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded module [transport-netty4]
[2017-03-26T19:08:12,280][INFO ][o.e.p.PluginsService ] [1WEhN6j] loaded plugin [x-pack]
[2017-03-26T19:08:13,382][WARN ][o.e.d.s.g.GroovyScriptEngineService] [groovy] scripts are deprecated, use [painless] scripts instead
[2017-03-26T19:08:21,064][INFO ][o.e.n.Node ] initialized
[2017-03-26T19:08:21,066][INFO ][o.e.n.Node ] [1WEhN6j] starting ...
[2017-03-26T19:08:21,976][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: a8:90:75:a9:3a:9a:65:f3
[2017-03-26T19:08:22,269][INFO ][o.e.t.TransportService ] [1WEhN6j] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2017-03-26T19:08:22,298][WARN ][o.e.b.BootstrapChecks ] [1WEhN6j] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2017-03-26T19:08:22,298][WARN ][o.e.b.BootstrapChecks ] [1WEhN6j] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2017-03-26T19:08:25,499][INFO ][o.e.c.s.ClusterService ] [1WEhN6j] new_master {1WEhN6j}{1WEhN6juR--PB5kOwCTJQA}{f3tBrOMKQYqTptKjSgwDgg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-03-26T19:08:25,585][INFO ][o.e.h.HttpServer ] [1WEhN6j] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2017-03-26T19:08:25,586][INFO ][o.e.n.Node ] [1WEhN6j] started
[2017-03-26T19:08:27,631][INFO ][o.e.l.LicenseService ] [1WEhN6j] license [fba9ada3-6300-45ee-98b5-bbf355d8cfe3] mode [trial] - valid
[2017-03-26T19:08:27,670][INFO ][o.e.g.GatewayService ] [1WEhN6j] recovered [6] indices into cluster_state
[2017-03-26T19:09:32,636][ERROR][o.e.x.m.AgentService ] [1WEhN6j] exception when exporting documents
org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:148) ~[x-pack-5.2.2.jar:5.2.2]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.close(ExportBulk.java:77) ~[x-pack-5.2.2.jar:5.2.2]
at org.elasticsearch.xpack.monitoring.exporter.Exporters.export(Exporters.java:183) ~[x-pack-5.2.2.jar:5.2.2]
at org.elasticsearch.xpack.monitoring.AgentService$ExportingWorker.run(AgentService.java:196) [x-pack-5.2.2.jar:5.2.2]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92-internal]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:114) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:121) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:111) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more
[2017-03-26T19:10:15,948][INFO ][o.e.c.r.a.AllocationService] [1WEhN6j] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.monitoring-es-2-2017.03.26][0]] ...]).

I have ~76M small documents in my ES. Total size of DB files is ~35GB

What should I do ?

I'd probably start by fixing all the warnings which appears in your logs (bootstrap)

Setting

sudo sysctl -w vm.max_map_count=262144

helped to get rid of

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

but I still can't find how to increase max file descriptors. In the log it says

max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]

but I don't see such number in host system

[expert@ip-172-30-0-246 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31863
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31863
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Thoughts ?

This can help?

https://www.elastic.co/guide/en/elasticsearch/reference/current/file-descriptors.html

I fixed limit with docker's --ulimit nofile=65536:65536.

So you think ES was crashing because of these limits ?

Probably. At least going in production without those settings is really bad.

Well, sadly it didn't help. But I got exception in the log. Maybe it's related ?

[2017-04-01T20:05:49,483][INFO ][o.e.g.GatewayService     ] [1WEhN6j] recovered [9] indices into cluster_state
[2017-04-01T20:06:54,318][ERROR][o.e.x.m.AgentService     ] [1WEhN6j] exception when exporting documents
org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:148) ~[x-pack-5.2.2.jar:5.2.2]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.close(ExportBulk.java:77) ~[x-pack-5.2.2.jar:5.2.2]
	at org.elasticsearch.xpack.monitoring.exporter.Exporters.export(Exporters.java:183) ~[x-pack-5.2.2.jar:5.2.2]
	at org.elasticsearch.xpack.monitoring.AgentService$ExportingWorker.run(AgentService.java:196) [x-pack-5.2.2.jar:5.2.2]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92-internal]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:114) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
	... 4 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:121) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:111) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
	... 4 more
[2017-04-01T20:07:00,129][INFO ][o.e.c.r.a.AllocationService] [1WEhN6j] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.monitoring-es-2-2017.04.01][0]] ...]).
[2017-04-02T00:00:03,652][INFO ][o.e.c.m.MetaDataCreateIndexService] [1WEhN6j] [.monitoring-es-2-2017.04.02] creating index, cause [auto(bulk api)], templates [.monitoring-es-2], shards [1]/[1], mappings [shards, _default_, node, index_stats, index_recovery, cluster_state, cluster_stats, indices_stats, node_stats]
[2017-04-02T00:00:03,765][INFO ][o.e.c.m.MetaDataMappingService] [1WEhN6j] [.monitoring-es-2-2017.04.02/lZ9x2xIWTRywMr1HbyqYxw] update_mapping [cluster_stats]
[2017-04-02T00:00:03,816][INFO ][o.e.c.m.MetaDataMappingService] [1WEhN6j] [.monitoring-es-2-2017.04.02/lZ9x2xIWTRywMr1HbyqYxw] update_mapping [indices_stats]
[2017-04-02T00:00:03,853][INFO ][o.e.c.m.MetaDataMappingService] [1WEhN6j] [.monitoring-es-2-2017.04.02/lZ9x2xIWTRywMr1HbyqYxw] update_mapping [index_stats]
[2017-04-02T00:00:03,916][INFO ][o.e.c.m.MetaDataMappingService] [1WEhN6j] [.monitoring-es-2-2017.04.02/lZ9x2xIWTRywMr1HbyqYxw] update_mapping [node_stats]
[2017-04-02T01:00:00,010][INFO ][o.e.x.m.e.l.LocalExporter] cleaning up [1] old indices
[2017-04-02T01:00:00,017][INFO ][o.e.c.m.MetaDataDeleteIndexService] [1WEhN6j] [.monitoring-es-2-2017.03.26/u3UAMAUgRJuTN66fmhqZlw] deleting index
[2017-04-02T07:38:24,250][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [1WEhN6j] collector [cluster-stats-collector] timed out when collecting data
[2017-04-02T07:52:14,974][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [1WEhN6j] collector [cluster-stats-collector] timed out when collecting data

But does it stop the node?

Yep. Unfortunately it does.

I just tried to uninstall X-Pack. We'll see if it helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.