Docker container randomly stopping with no error log

I have setup a developer environment on my machine (Mac M1 Ultra). I am using the ARM docker images. I have blogged my full setup notes for my config: Elastic Search 8.2.3 + Kibana + Enterprise Search — Developer Env Setup Notes (Docker only) | by Kristan 'Krispy' Uccello | Jun, 2022 | Medium

The Enterprise Search container works for about 60 seconds and then exists with no error logs. Here is the last output from the container:

Overwriting the default Enterprise Search configuration file: /usr/share/enterprise-search/config/enterprise-search.yml (if it fails, please make sure it is writeable)

Found java executable in PATH

Java version detected: 11.0.11 (major version: 11)

Enterprise Search is starting...

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by org.jruby.javasupport.binding.ConstantField (file:/usr/share/enterprise-search/lib/war/lib/jruby-core-9.3.3.0-complete.jar) to field sun.nio.cs.US_ASCII.INSTANCE

WARNING: Please consider reporting this to the maintainers of org.jruby.javasupport.binding.ConstantField

WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

WARNING: All illegal access operations will be denied in a future release

[2022-06-15T22:39:40.890+00:00][8][4004][app-server][INFO]: Elastic Enterprise Search version=8.2.3, JRuby version=9.3.3.0, Ruby version=2.6.8, Rails version=5.2.7

[2022-06-15T22:39:43.157+00:00][8][4004][app-server][INFO]: Performing pre-flight checks for Elasticsearch running on https://es01:9200...

[2022-06-15T22:39:43.879+00:00][8][4004][app-server][INFO]: [pre-flight] Elasticsearch cluster is ready

[2022-06-15T22:39:43.882+00:00][8][4004][app-server][INFO]: [pre-flight] Successfully connected to Elasticsearch

[2022-06-15T22:39:43.952+00:00][8][4004][app-server][INFO]: [pre-flight] Successfully loaded Elasticsearch plugin information for all nodes

[2022-06-15T22:39:43.972+00:00][8][4004][app-server][INFO]: [pre-flight] Elasticsearch running with an active basic license

[2022-06-15T22:39:44.077+00:00][8][4004][app-server][INFO]: [pre-flight] Elasticsearch API key service is enabled

[2022-06-15T22:39:44.097+00:00][8][4004][app-server][INFO]: [pre-flight] Elasticsearch will be used for authentication

[2022-06-15T22:39:44.102+00:00][8][4004][app-server][INFO]: Elasticsearch looks healthy and configured correctly to run Enterprise Search

[2022-06-15T22:39:44.104+00:00][8][4004][app-server][INFO]: Performing pre-flight checks for Kibana running on http://kibana:5601...

[2022-06-15T22:39:44.300+00:00][8][4004][app-server][INFO]: [pre-flight] Successfully connected to Kibana

[2022-06-15T22:39:45.205+00:00][8][4004][app-server][INFO]: Kibana looks healthy and configured correctly to run Enterprise Search

This behavior started after I setup an engine in app search through Kibana and pointed it at a domain to crawl. Once it started crawling the domain then it crashed and will not recover.

Thoughts? Ideas? I'm not sure what is exactly failing here.

Hi Kristan!

Judging by the logs I don't see anything particularly suspicious, it should just work.

I think there are a couple things that can be done to investigate further:

  1. Switch log_level to DEBUG and see how far it gets. I'm curious whether it always fails at the specific point, or fails randomly on random lines.
  2. Thinking a bit about docker itself - how much RAM and disk space is available for the image when it starts? Could it be terminated by the docker itself due to crossing some limits?

Meanwhile I'll check with a team, whether there are known issues of similar type - connected with M1/ARM images. We do testing of our stack on M1/ARM, so it should just work out of the box, but never hurts to double check :slight_smile:

Hi @Kristan_Uccello , it must be really frustrating. Adding to what @Artem_Shelkovnikov said, I'd also try to inspect the failed container if possible:

docker container ls -a should give you a list of stopped containers
docker container inspect failed_container_hash should return a JSON response with a State clause which might have something to help clarify the situation a bit. Example:

"State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "driver failed programming external connectivity on endpoint spec-elasticsearch-1 (e64174cf45ac2841d1e32cb718372eac6811144917b48e1c787a39282f394643): Bind for 0.0.0.0:9200 failed: port is already allocated",
            "StartedAt": "2022-06-03T08:43:53.426081814Z",
            "FinishedAt": "2022-06-09T10:30:18.9265641Z"
        }

I think it's at least worth a try.

@Artem_Shelkovnikov @maryna.cherniavska

I switched log level to debug and no errors listed before the container just stops (the last items in the log are the single domain crawl activity):

[2022-06-16T15:56:14.163+00:00][8][5060][es][DEBUG]: {
  "request": {
    "url": "https://es01:9200/.ent-search-actastic-app_search_accounts_v11/_search?request_cache=true",
    "method": "get",
    "headers": {
      "Authorization": "[FILTERED]",
      "Content-Type": "application/json",
      "x-elastic-product-origin": "enterprise-search",
      "User-Agent": "Faraday v1.8.0"
    },
    "params": null,
    "body": "{\"query\":{\"bool\":{}},\"sort\":[\"_doc\"],\"size\":1,\"from\":0,\"seq_no_primary_term\":true}"
  },
  "response": {
    "status": 200,
    "headers": {
      "x-elastic-product": "Elasticsearch",
      "content-type": "application/json",
      "content-length": "749"
    },
    "body": "{\"took\":2,\"timed_out\":false,\"_shards\":{\"total\":1,\"successful\":1,\"skipped\":0,\"failed\":0},\"hits\":{\"total\":{\"value\":1,\"relation\":\"eq\"},\"max_score\":null,\"hits\":[{\"_index\":\".ent-search-actastic-app_search_accounts_v11\",\"_id\":\"62aa385b0ba5b28255fc8fd6\",\"_seq_no\":0,\"_primary_term\":1,\"_score\":null,\"_source\":{\"id\":\"62aa385b0ba5b28255fc8fd6\",\"created_at\":\"2022-06-15T19:51:55Z\",\"updated_at\":\"2022-06-15T19:51:55Z\",\"limit_overrides\":{},\"metered_plan_limit_overrides\":{},\"seeding_sample_engine\":null,\"onboarding_state\":\"create_engine\",\"name\":\"enterprise_search\",\"authorization_strategies\":null,\"telemetry_last_sent_at\":null,\"api_log_disabled_at\":null,\"analytics_log_disabled_at\":null,\"crawler_log_disabled_at\":null,\"audit_log_disabled_at\":null},\"sort\":[0]}]}}"
  },
  "duration": 6.6,
  "stack": [
    "lib/actastic/schema.class:55:in `search'",
    "lib/actastic/relation.class:538:in `block in search'",
    "lib/apm_helpers.class:52:in `es_action_instrument'",
    "lib/apm_helpers.class:57:in `actastic_instrument'",
    "lib/actastic/relation.class:518:in `instrument'",
    "lib/actastic/relation.class:537:in `search'",
    "lib/actastic/relation.class:225:in `find_each'",
    "lib/actastic/relation.class:307:in `to_a'",
    "lib/actastic/relation.class:307:in `load'",
    "lib/actastic/relation.class:302:in `to_a'",
    "lib/actastic/relation.class:323:in `each'",
    "lib/actastic/relation.class:216:in `first'",
    "lib/actastic/relation.class:216:in `first_with_limit'",
    "lib/actastic/actastic_record/class_methods.class:391:in `first'",
    "shared_togo/lib/shared_togo/crawler/event_log_formatter.class:8:in `skip_ingest?'",
    "shared_togo/lib/shared_togo/events/event_log_formatter.class:9:in `call'",
    "loco_moco/lib/crawler/loco_moco/api/config.class:14:in `<<'",
    "crawler/lib/crawler/api/config.class:322:in `output_event'",
    "loco_moco/lib/crawler/loco_moco/api/config.class:88:in `output_event'",
    "crawler/lib/crawler/event_generator.class:330:in `log'",
    "crawler/lib/crawler/event_generator.class:305:in `log_event'",
    "crawler/lib/crawler/event_generator.class:296:in `log_crawl_event'",
    "crawler/lib/crawler/event_generator.class:292:in `log_url_event'",
    "crawler/lib/crawler/event_generator.class:175:in `url_discover'",
    "crawler/lib/crawler/coordinator.class:609:in `check_discovered_url'",
    "crawler/lib/crawler/coordinator.class:470:in `block in add_urls_to_backlog'",
    "crawler/lib/crawler/coordinator.class:456:in `each'",
    "crawler/lib/crawler/coordinator.class:456:in `add_urls_to_backlog'",
    "crawler/lib/crawler/coordinator.class:360:in `extract_and_enqueue_html_links'",
    "crawler/lib/crawler/coordinator.class:321:in `extract_and_enqueue_links'",
    "crawler/lib/crawler/coordinator.class:283:in `block in process_crawl_result'",
    "crawler/lib/crawler/coordinator.class:283:in `process_crawl_result'",
    "crawler/lib/crawler/coordinator.class:257:in `execute_crawl_task'",
    "crawler/lib/crawler/coordinator.class:242:in `block in run_crawl_loop'"
  ]
}
[2022-06-16T15:56:14.168+00:00][8][5116][es][DEBUG]: {
  "request": {
    "url": "https://es01:9200/.ent-search-actastic-app_search_accounts_v11/_search?request_cache=true",
    "method": "get",
    "headers": {
      "Authorization": "[FILTERED]",
      "Content-Type": "application/json",
      "x-elastic-product-origin": "enterprise-search",
      "User-Agent": "Faraday v1.8.0"
    },
    "params": null,
    "body": "{\"query\":{\"bool\":{}},\"sort\":[\"_doc\"],\"size\":1,\"from\":0,\"seq_no_primary_term\":true}"
  },
  "response": {
    "status": 200,
    "headers": {
      "x-elastic-product": "Elasticsearch",
      "content-type": "application/json",
      "content-length": "749"
    },
    "body": "{\"took\":0,\"timed_out\":false,\"_shards\":{\"total\":1,\"successful\":1,\"skipped\":0,\"failed\":0},\"hits\":{\"total\":{\"value\":1,\"relation\":\"eq\"},\"max_score\":null,\"hits\":[{\"_index\":\".ent-search-actastic-app_search_accounts_v11\",\"_id\":\"62aa385b0ba5b28255fc8fd6\",\"_seq_no\":0,\"_primary_term\":1,\"_score\":null,\"_source\":{\"id\":\"62aa385b0ba5b28255fc8fd6\",\"created_at\":\"2022-06-15T19:51:55Z\",\"updated_at\":\"2022-06-15T19:51:55Z\",\"limit_overrides\":{},\"metered_plan_limit_overrides\":{},\"seeding_sample_engine\":null,\"onboarding_state\":\"create_engine\",\"name\":\"enterprise_search\",\"authorization_strategies\":null,\"telemetry_last_sent_at\":null,\"api_log_disabled_at\":null,\"analytics_log_disabled_at\":null,\"crawler_log_disabled_at\":null,\"audit_log_disabled_at\":null},\"sort\":[0]}]}}"
  },
  "duration": 20.9,
  "stack": [
    "lib/actastic/schema.class:55:in `search'",
    "lib/actastic/relation.class:538:in `block in search'",
    "lib/apm_helpers.class:52:in `es_action_instrument'",
    "lib/apm_helpers.class:57:in `actastic_instrument'",
    "lib/actastic/relation.class:518:in `instrument'",
    "lib/actastic/relation.class:537:in `search'",
    "lib/actastic/relation.class:225:in `find_each'",
    "lib/actastic/relation.class:307:in `to_a'",
    "lib/actastic/relation.class:307:in `load'",
    "lib/actastic/relation.class:302:in `to_a'",
    "lib/actastic/relation.class:323:in `each'",
    "lib/actastic/relation.class:216:in `first'",
    "lib/actastic/relation.class:216:in `first_with_limit'",
    "lib/actastic/actastic_record/class_methods.class:391:in `first'",
    "shared_togo/lib/shared_togo/crawler/event_log_formatter.class:8:in `skip_ingest?'",
    "shared_togo/lib/shared_togo/events/event_log_formatter.class:9:in `call'",
    "loco_moco/lib/crawler/loco_moco/api/config.class:14:in `<<'",
    "crawler/lib/crawler/api/config.class:322:in `output_event'",
    "loco_moco/lib/crawler/loco_moco/api/config.class:88:in `output_event'",
    "crawler/lib/crawler/event_generator.class:330:in `log'",
    "crawler/lib/crawler/event_generator.class:305:in `log_event'",
    "crawler/lib/crawler/event_generator.class:296:in `log_crawl_event'",
    "crawler/lib/crawler/event_generator.class:292:in `log_url_event'",
    "crawler/lib/crawler/event_generator.class:175:in `url_discover'",
    "crawler/lib/crawler/event_generator.class:189:in `url_discover_denied'",
    "crawler/lib/crawler/coordinator.class:604:in `check_discovered_url'",
    "crawler/lib/crawler/coordinator.class:470:in `block in add_urls_to_backlog'",
    "crawler/lib/crawler/coordinator.class:456:in `each'",
    "crawler/lib/crawler/coordinator.class:456:in `add_urls_to_backlog'",
    "crawler/lib/crawler/coordinator.class:360:in `extract_and_enqueue_html_links'",
    "crawler/lib/crawler/coordinator.class:321:in `extract_and_enqueue_links'",
    "crawler/lib/crawler/coordinator.class:283:in `block in process_crawl_result'",
    "crawler/lib/crawler/coordinator.class:283:in `process_crawl_result'",
    "crawler/lib/crawler/coordinator.class:257:in `execute_crawl_task'",
    "crawler/lib/crawler/coordinator.class:242:in `block in run_crawl_loop'"
  ]
}
[2022-06-16T15:56:14.183+00:00][8][5068][es][DEBUG]: {
  "request": {
    "url": "https://es01:9200/enterprise-search-engine-creditkarmapub/_bulk",
    "method": "post",
    "headers": {
      "Authorization": "[FILTERED]",
      "Content-Type": "application/json",
      "x-elastic-product-origin": "enterprise-search",
      "User-Agent": "Faraday v1.8.0"
    },
    "params": null,
    "body": "removed to save space in this message"}\n"
  },
  "response": {
    "status": 200,
    "headers": {
      "x-elastic-product": "Elasticsearch",
      "content-type": "application/json",
      "content-length": "260"
    },
    "body": "{\"took\":45,\"errors\":false,\"items\":[{\"index\":{\"_index\":\".ent-search-engine-documents-creditkarmapub\",\"_id\":\"62ab528a329d3c9a199894ed\",\"_version\":1,\"result\":\"created\",\"_shards\":{\"total\":2,\"successful\":2,\"failed\":0},\"_seq_no\":69,\"_primary_term\":9,\"status\":201}}]}"
  },
  "duration": 52.6,
  "stack": [
    "lib/swiftype/document_storage/backends/managed_index_backend.class:74:in `bulk_action'",
    "lib/swiftype/document_storage/backends/managed_index_backend.class:26:in `block in save_documents'",
    "lib/swiftype/document_storage/backends/managed_index_backend.class:21:in `each'",
    "lib/swiftype/document_storage/backends/managed_index_backend.class:21:in `save_documents'",
    "lib/swiftype/documents.class:94:in `save_documents'",
    "app/models/search_index/documents_concern.class:51:in `save_documents'",
    "lib/abstract_upsert_document_service.class:46:in `upsert'",
    "loco_moco/lib/crawler/output_sink/app_search.class:272:in `upsert_app_search_document'",
    "loco_moco/lib/crawler/output_sink/app_search.class:156:in `process_success'",
    "loco_moco/lib/crawler/output_sink/app_search.class:74:in `process_crawl_result'",
    "loco_moco/lib/crawler/output_sink/app_search.class:34:in `block in write'",
    "loco_moco/lib/crawler/output_sink/app_search.class:33:in `write'",
    "crawler/lib/crawler/coordinator.class:431:in `output_crawl_result'",
    "crawler/lib/crawler/coordinator.class:306:in `process_crawl_result'",
    "crawler/lib/crawler/coordinator.class:257:in `execute_crawl_task'",
    "crawler/lib/crawler/coordinator.class:242:in `block in run_crawl_loop'"
  ]
}
[2022-06-16T15:56:14.736+00:00][8][5068][crawler][INFO]: [crawl:62aa5c5b0ba5b27abefc8fee] [primary] Processed crawl results from the page 'https://www.creditkarma.com/resources' via the app_search output. Outcome: success. Message: Indexed the document into App Search with doc_id=62ab528a329d3c9a199894ed.

Docker Ram is set to 8gb with single container limits set to 1gb for each container instance:

portion of my docker-compose.yml:

...
 environment:
      - "secret_management.encryption_keys=['REDACTED']"
      - "allow_es_settings_modification=true"
      - "elasticsearch.host='https://es01:9200'"
      - "elasticsearch.username='elastic'"
      - "elasticsearch.password='REDACTED'"
      - "elasticsearch.ssl.enabled=true"
      - "elasticsearch.ssl.certificate_authority='/usr/share/enterprise-search/config/certs/ca/ca.crt'"
      - "elasticsearch.ssl.certificate='/usr/share/enterprise-search/config/certs/es01/es01.crt'"
      - "elasticsearch.ssl.key='/usr/share/enterprise-search/config/certs/es01/es01.key'"
      - "elasticsearch.ssl.verify=true"
      - "kibana.host='http://kibana:5601'"
      - "kibana.external_url='http://localhost:5601'"
      - "secret_session_key=REDACTED"
      - "secret_management.enforce_valid_encryption_keys=false"
      - "ent_search.ssl.enabled=false"
      - "ent_search.external_url='http://localhost:3002'"
      - "ent_search.listen_host=0.0.0.0"
      - "ent_search.listen_port=3002"
      - "log_level=debug"
mem_limit: 1gb
...

@maryna.cherniavska Good call on the state inspect - the information from the Enterprise Search container instance after it exits itself - which it looks like it is an out of memory issue OOMKilled = true:

"State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": true,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2022-06-16T16:35:29.400521377Z",
            "FinishedAt": "2022-06-16T16:37:13.389957092Z",
            "Health": {
                "Status": "unhealthy",
                "FailingStreak": 0,
                "Log": [
                    {
                        "Start": "2022-06-16T16:36:22.459438555Z",
                        "End": "2022-06-16T16:36:22.965434055Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2022-06-16T16:36:32.97288792Z",
                        "End": "2022-06-16T16:36:33.875854587Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2022-06-16T16:36:43.893081217Z",
                        "End": "2022-06-16T16:36:44.745158676Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2022-06-16T16:36:54.758904833Z",
                        "End": "2022-06-16T16:36:56.311644042Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2022-06-16T16:37:06.319005422Z",
                        "End": "2022-06-16T16:37:09.42528309Z",
                        "ExitCode": 0,
                        "Output": ""
                    }
                ]
            }
        },

I changed the mem_limit value from 1gb to 4gb and now it seems the container instance is stable and not killing itself.

Thank you for the helpful hints!

2 Likes

Glad that you were able to figure out what happened to the container @Kristan_Uccello!

While it's possible to set up Enterprise Search work with 1GB RAM hosts, some features work only when 4GB ram is available - for instance, extraction of text from the files downloaded, or thumbnailing content of the documents in Workplace Search.

I would for now recommend a 4GB RAM host for Enterprise Search (while it should also work with 2GB, but some features unavailable).

We also know that documentation is far from perfect when it comes to explaining these hardware limitations and we are looking into improving it.

1 Like