Random exits every few weeks - Native controller process has stopped - no new native processes can be started

Hi all,

I'm having a single node Docker instance which stops for some reason every few weeks. I'd like to find out the root cause and how to fix it.

Last few logs:

2022-07-31T01:38:00.006473091Z {"@timestamp":"2022-07-31T01:38:00.006Z", "log.level": "INFO", "message":"Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-07-31T22:24:39.764755178Z {"@timestamp":"2022-07-31T22:24:39.764Z", "log.level": "WARN", "message":"invalid internal transport message format, got (4d,47,4c,4e), [Netty4TcpChannel{localAddress=/172.19.0.4:9300, remoteAddress=/192.241.216.109:51310, profile=default}], closing connection", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][transport_worker][T#39]","log.logger":"org.elasticsearch.transport.TcpTransport","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:30:00.002045204Z {"@timestamp":"2022-08-01T01:30:00.000Z", "log.level": "INFO", "message":"starting SLM retention snapshot cleanup task", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.slm.SnapshotRetentionTask","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:30:00.002101071Z {"@timestamp":"2022-08-01T01:30:00.001Z", "log.level": "INFO", "message":"there are no repositories to fetch, SLM retention snapshot cleanup task complete", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.slm.SnapshotRetentionTask","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.001081255Z {"@timestamp":"2022-08-01T01:38:00.000Z", "log.level": "INFO", "message":"triggering scheduled [ML] maintenance tasks", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#10]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.001590810Z {"@timestamp":"2022-08-01T01:38:00.001Z", "log.level": "INFO", "message":"Deleting expired data", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#10]","log.logger":"org.elasticsearch.xpack.ml.action.TransportDeleteExpiredDataAction","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005447894Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Successfully deleted [0] unused stats documents", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][generic][T#9]","log.logger":"org.elasticsearch.xpack.ml.job.retention.UnusedStatsRemover","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005894448Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Completed deletion of expired ML data", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.action.TransportDeleteExpiredDataAction","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:38:00.005980875Z {"@timestamp":"2022-08-01T01:38:00.005Z", "log.level": "INFO", "message":"Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[5c656d40c91c][ml_utility][T#2]","log.logger":"org.elasticsearch.xpack.ml.MlDailyMaintenanceService","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:50:52.368071257Z {"@timestamp":"2022-08-01T01:50:51.387Z", "log.level": "INFO", "message":"Native controller process has stopped - no new native processes can be started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.NativeController","elasticsearch.cluster.uuid":"IEwjbAHFT_SMOJB8ocTC8w","elasticsearch.node.id":"sgYcgZ3CRDiJL0BGbcvMWQ","elasticsearch.node.name":"5c656d40c91c","elasticsearch.cluster.name":"docker-cluster"}
2022-08-01T01:50:53.215204181Z 
2022-08-01T01:50:53.215646084Z ERROR: Elasticsearch exited unexpectedly

My Docker config:

elastic:
        image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
        container_name: elastic
        environment:
            - discovery.type=single-node
            - ES_HEAP_SIZE=40g
            - LS_HEAP_SIZE=40g
            - 'ES_JAVA_OPTS=-Xms40g -Xmx40g'
            - ELASTIC_USERNAME=elastic
            - ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-}
        volumes:
            - ./elastic/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
            - ./elastic:/usr/share/elasticsearch/data
        ports:
            - "9200:9200"
            - "9300:9300"

I guess the message "Native controller process has stopped - no new native processes can be started" can give some clues, but I could only find start-up related issues, not cases when it was already running for several days.