Hi all,
I'm having a single node Docker instance which stops for some reason every few weeks. I'd like to find out the root cause and how to fix it.
Last few logs:
2022-07-31T01:38:00.006473091Z {"@timestamp":"2022-07-31T01:38:00.006Z", "log.level": "INFO", "message":"Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-07-31T22:24:39.764755178Z {"@timestamp":"2022-07-31T22:24:39.764Z", "log.level": "WARN", "message":"invalid internal transport message format, got (4d,47,4c,4e), [Netty4TcpChannel{localAddress=/172.19.0.4:9300, remoteAddress=/192.241.216.109:51310, profile=default}], closing connection", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][transport_worker][T#39]","log.logger":"org.elasticsearch.transport.TcpTransport","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:30:00.002045204Z {"@timestamp":"2022-08-01T01:30:00.000Z", "log.level": "INFO", "message":"starting SLM retention snapshot cleanup task", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.slm.SnapshotRetentionTask","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:30:00.002101071Z {"@timestamp":"2022-08-01T01:30:00.001Z", "log.level": "INFO", "message":"there are no repositories to fetch, SLM retention snapshot cleanup task complete", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.slm.SnapshotRetentionTask","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.001081255Z {"@timestamp":"2022-08-01T01:38:00.000Z", "log.level": "INFO", "message":"triggering scheduled [ML] maintenance tasks", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#10]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.001590810Z {"@timestamp":"2022-08-01T01:38:00.001Z", "log.level": "INFO", "message":"Deleting expired data", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#10]","log.logger":"org.elasticsearch.xpack.ml.action.TransportDeleteExpiredDataAction","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005447894Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Successfully deleted [0] unused stats documents", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#9]","log.logger":"org.elasticsearch.xpack.ml.job.retention.UnusedStatsRemover","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005894448Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Completed deletion of expired ML data", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.action.TransportDeleteExpiredDataAction","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005980875Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:50:52.368071257Z {"@timestamp":"2022-08-01T01:50:51.387Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.NativeController","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:50:53.215204181Z
2022-08-01T01:50:53.215646084Z ERROR: Elasticsearch exited unexpectedly
My Docker config:
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
container_name: elastic
environment:
- discovery.type=single-node
- ES_HEAP_SIZE=40g
- LS_HEAP_SIZE=40g
- 'ES_JAVA_OPTS=-Xms40g -Xmx40g'
- ELASTIC_USERNAME=elastic
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-}
volumes:
- ./elastic/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
- ./elastic:/usr/share/elasticsearch/data
ports:
- "9200:9200"
- "9300:9300"
I guess the message "Native controller process has stopped - no new native processes can be started" can give some clues, but I could only find start-up related issues, not cases when it was already running for several days.