Elasticsearch stops in minutes when restarted

When I restart Elasticsearch, it stops more than 10 minutes at "logger":"o.e.g.GatewayService", "log":"recovered [0] indices into cluster_state"

I don't know the root cause. Please help me. Thanks for your time!

{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:16.669Z","logger":"o.e.p.PluginsService","timezone":"UTC","marker":"[es_node] ","log":"loaded module [transport-netty4]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:16.669Z","logger":"o.e.p.PluginsService","timezone":"UTC","marker":"[es_node] ","log":"loaded module [tribe]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:16.669Z","logger":"o.e.p.PluginsService","timezone":"UTC","marker":"[es_node] ","log":"loaded plugin [ingest-attachment]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:16.670Z","logger":"o.e.p.PluginsService","timezone":"UTC","marker":"[es_node] ","log":"loaded plugin [ingest-geoip]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:16.670Z","logger":"o.e.p.PluginsService","timezone":"UTC","marker":"[es_node] ","log":"loaded plugin [ingest-user-agent]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"WARN","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:19.480Z","logger":"o.e.d.c.s.Settings","timezone":"UTC","marker":"[es_node] ","log":"[http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version."}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:20.487Z","logger":"o.e.d.DiscoveryModule","timezone":"UTC","marker":"[es_node] ","log":"using discovery type [zen] and host providers [settings]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:21.204Z","logger":"o.e.n.Node","timezone":"UTC","marker":"[es_node] ","log":"initialized"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:21.204Z","logger":"o.e.n.Node","timezone":"UTC","marker":"[es_node] ","log":"starting ..."}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:21.357Z","logger":"o.e.t.TransportService","timezone":"UTC","marker":"[es_node] ","log":"publish_address {172.17.0.15:9300}, bound_addresses {0.0.0.0:9300}"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:21.371Z","logger":"o.e.b.BootstrapChecks","timezone":"UTC","marker":"[es_node] ","log":"bound or publishing to a non-loopback address, enforcing bootstrap checks"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:24.444Z","logger":"o.e.c.s.MasterService","timezone":"UTC","marker":"[es_node] ","log":"zen-disco-elected-as-master ([0] nodes joined), reason: new_master {es_node}{B7xQoWLSQ2KV5vYUrJFRlA}{YwMOe7tQRu6qbh9xRZsCfg}{172.17.0.15}{172.17.0.15:9300}"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:24.454Z","logger":"o.e.c.s.ClusterApplierService","timezone":"UTC","marker":"[es_node] ","log":"new_master {es_node}{B7xQoWLSQ2KV5vYUrJFRlA}{YwMOe7tQRu6qbh9xRZsCfg}{172.17.0.15}{172.17.0.15:9300}, reason: apply cluster state (from master [master {es_node}{B7xQoWLSQ2KV5vYUrJFRlA}{YwMOe7tQRu6qbh9xRZsCfg}{172.17.0.15}{172.17.0.15:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:24.488Z","logger":"o.e.h.n.Netty4HttpServerTransport","timezone":"UTC","marker":"[es_node] ","log":"publish_address {172.17.0.15:9200}, bound_addresses {0.0.0.0:9200}"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:33:24.488Z","logger":"o.e.n.Node","timezone":"UTC","marker":"[es_node] ","log":"started"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T**09:33:24**.500Z","logger":"o.e.g.GatewayService","timezone":"UTC","marker":"[es_node] ","log":"recovered [0] indices into cluster_state"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"WARN","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T**09:48:36**.955Z","logger":"o.e.d.c.m.MetaDataCreateIndexService","timezone":"UTC","marker":"[es_node] ","log":"the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:48:37.046Z","logger":"o.e.c.m.MetaDataCreateIndexService","timezone":"UTC","marker":"[es_node] ","log":"[logstash-2019.09.05] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:48:37.539Z","logger":"o.e.c.m.MetaDataMappingService","timezone":"UTC","marker":"[es_node] ","log":"[logstash-2019.09.05/Ie4GTU_UQ0KGvGYH2qdsag] create_mapping [fluentd]"}
{"type":"log","host":"local-altiplano-elasticsearch-0","level":"INFO","systemid":"e7c84e34d38a49e1ae639a5dab455af5","system":"BELK","time": "2019-09-05T09:48:38.350Z","logger":"o.e.c.m.MetaDataMappingService","timezone":"UTC","marker":"[es_node] ","log":"[logstash-2019.09.05/Ie4GTU_UQ0KGvGYH2qdsag] update_mapping [fluentd]"}

Can you share your elasticsearch.yml please? And any other settings you are using?

The Elasticsearch.yml is the following:
version: '3.4'
networks:
esnet:
external: true
services:
kibana:
image: "${ARTIFACTORY_IP}:9000/elk_k:${KIBANA_VERSION}"
container_name: anv-kibana
restart: always
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
- TZ=UTC
depends_on:
- elasticsearch
networks:
- esnet
elasticsearch:
image: "${ARTIFACTORY_IP}:9000/elk_e:${ES_VERSION}"
container_name: anv-elasticsearch
restart: always
ports:
- "9200:9200"
volumes:
- /fnms/data_backup:/data/elasticsearch_backup
- /fnms/config:/etc/elasticsearch/config
environment:
- ES_JAVA_OPTS=-Xms26g -Xmx26g
- TZ=UTC
networks:
- esnet

The .env is the following:
ARTIFACTORY_IP=10.131.69.114
KIBANA_VERSION=6.5.4-19.03.01
ES_VERSION=6.5.4-19.01.0

The issue only occurs when restarting Elasticsearch. Please advise. Thank you!

Actually, the issue doesn't always appear, it just usually occurs.

Given that you don't make any changes to your config and If the error is not consistent then its hard to get down to the issue. You can turn on the DEBUG log to see the debug output. Also make sure to monitor disk and memory resources.

This is your docker file not elasticsearch.yml

The log posted tells you Elasticsearch is started and running.
What do you get with curl -XGET http://elasticsearch:9200/

Yes, it's running. But it stops at o.e.g.GatewayService more than 10 minutes. This only occurs when it's restarted.

And when that happens, there is nothing else in the logs?

Are you able to logon to the machine and do a journal -xe or service elasticsearch status
I am not really familiar with docker images so that might not be possible...

Actually, when Elasticsearch is restarted, Kibana stops displaying logs. Then, Elasticsearch runs again and stops showing logs at o.e.g.GatewayService. Kibana only displays logs once Elasticsearch continues running MetaDataCreateIndexService and MetaDataMappingService. It takes more than 10 minutes for delaying. I don't know how to find the root cause.

Sorry. It's my mistakes. I uses docker to run Elasticsearch.
I think the image below is elasticsearch.yml

Thanks for your advice. I tried, but it resets to info automatically when Elasticsearch is restarted. Do you have any ideas?

I'm struggling to understand what the issue is here. This sounds like completely normal behaviour for Elasticsearch. It doesn't log much when it's running normally, and recovered [0] indices into cluster_state is usually the last line it logs when starting up.

Can you explain more clearly why you think this is a problem?

When it starts first, it runs and shows logs "logger":"o.e.c.m.MetaDataMappingService", "log":"[logstash-2019.09.05/Ie4GTU_UQ0KGvGYH2qdsag] update_mapping [fluentd]" as the last line without delay. But when restarted, it stops at "recovered [0] indices into cluster_state" log in the period time and then continue to show "update_mapping [fluentd]" log.

Hence I think this is a problem.

Are you using persistent storage for your data path /data/data?

I found out the root cause. It's due to the default 5s of request_timeout parameter.

The Fluentd process can get into this state when every attempt to write logs to an Elasticsearch instance takes longer than 5 seconds to complete. Until a consistent number of writes takes less than 5 seconds to complete, logging essentially stops working.

In my case, if request_timeout is set to 1m, it will be working fine.

Thank you so much for your time!

Refer:


https://bugzilla.redhat.com/show_bug.cgi?id=1497836

Expanding on @Christian_Dahlqvist's question, the message indicates that there are no indices (i.e. no data at all) when the node restarts. It sounds like you previously inserted some data, so the question is why did it go away. The most likely explanation is that your data path is emptied when you restart your container.

I don't think anything is wrong with Elasticsearch here (it's looks like it's working normally and waiting for data) but maybe the process that you expect to send it data is waiting to find the index it created before the node restarted; after 10 minutes perhaps it gives up and creates a new index.

Yes, you're right. The root cause is the configuration for Fluentd.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.