Data Extraction Service (tika/openresty) crashes on second container start

Hi guys

I’m encountering a repeatable issue with the data-extraction-service (0.3.5) when running it via Docker Compose.

The service (tika + openresty inside the container) starts fine the first time I bring the stack up, but after I stop and start the containers again (without changing anything), the extraction-service container reports both tika and openresty as crashed.

Subsequent restarts keep failing until I recreate the container from scratch (docker compose down -v && up).
Here are my configuration

Docker-compose:

  extraction-service:
    depends_on:
      elasticsearch:
        condition: service_healthy
    image: docker.elastic.co/integrations/data-extraction-service:0.3.5
    container_name: extraction-service
    ports:
      - 8090:8090
    volumes:
      - ./elastic-connectors/files:/app/files
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8090/ping"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    restart: unless-stopped

Logs:

Service 'All': Status

Service `hwdrivers' needs non existent service `dev'

Service `machine-id' needs non existent service `dev'

 * Caching service dependencies ... [ ok ]

Runlevel: sysinit

Runlevel: nonetwork

Runlevel: default

Runlevel: shutdown

Runlevel: boot

Dynamic Runlevel: hotplugged

Dynamic Runlevel: needed/wanted

Dynamic Runlevel: manual

 * service tika added to runlevel boot

 * service openresty added to runlevel boot

Service 'Tika': Starting ...

 * Starting tika ... [ ok ]

Service 'Openresty': Starting ...

 * Starting openresty ... [ ok ]

Service 'All': Status

Runlevel: sysinit

Runlevel: nonetwork

Runlevel: default

Runlevel: shutdown

Runlevel: boot

 tika                                                            [  crashed  ]

 openresty                                                       [  crashed  ]

Dynamic Runlevel: hotplugged

Dynamic Runlevel: needed/wanted

Dynamic Runlevel: manual

 * rc-update: tika already installed in runlevel `boot'; skipping

 * rc-update: openresty already installed in runlevel `boot'; skipping

Service 'Tika': Starting ...

Related S3 connector config:

connectors:
-
  connector_id: "${S3CONFIG_CONNECTOR_ID}"
  service_type: "s3"

extraction_service:
  host: http://host.docker.internal:${ES_LOCAL_EXTRACTION_PORT} #8090
  timeout: 30
  use_file_pointers: false
  stream_chunk_size: 65536
  shared_volume_dir: '/app/files'

elasticsearch:
  host: "http://host.docker.internal:${ES_LOCAL_PORT}" #9200
  username: "${ES_LOCAL_USERNAME}"
  password: "${ES_LOCAL_PASSWORD}"
  log_level: INFO
  request_timeout: 120
  max_wait_duration: 60

s3 connector logs:

[FMWK][15:12:12][CRITICAL] Expected to find a running instance of data extraction service at http://host.docker.internal:8090⁠ but failed. Server disconnected.

The images I’m using are:

elasticsearch/elasticsearch:8.17.1
integrations/elastic-connectors:8.17.1
integrations/data-extraction-service:0.3.5⁠

Please tell me how can I fix this, thank you very much.

Hi @SPRINGPEACHVINH,

Can you share any error/ warning messages in the Tika and Openresty logs? The paths are available here.

Let us know!