Timeout errors Elastic GitHub Workflow Runner

Hi all,

We have our code hosted in GitHub, and we utilize GitHub workflows for our CI/CD pipeline. On each push in a feature branch, based on a docker compose file , in which we describe two services, two services are started and some tests are running. One of the two services is elasticsearch, and we use the image docker.elastic.co/elasticsearch/elasticsearch:8.5.3. The other depends on Elasticsearch service and starts only when the Elasticsearch service is healthy. Below you can see the description of elasticsearch

 elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.5.3
    environment:
      discovery.type: single-node
      ELASTIC_PASSWORD: ******
      ELASTIC_USERNAME: *******
      xpack.security.enabled: "false"
      xpack.security.enrollment.enabled: "false"
      xpack.security.http.ssl.enabled: "false"
      xpack.security.transport.ssl.enabled: "false"
      action.destructive_requires_name: "false"
    healthcheck:
      test: curl -s http://elasticsearch:9200 >/dev/null || exit 1
      interval: 10s
      timeout: 5s
      retries: 10
    ports:
      - 9200:9200
    ulimits:
      memlock:
        soft: -1
        hard: -1

On each push in Github those two service are up and running, and some tests that have been written using the pytest framework are running. In some of them we access the Elasticsearch and we do some stuff.

Until very recently, almost a week ago, everything was ok, the tests were running and passing. Out of the blue, we have started experiencing timeout errors, and essentially are tests are failing because of this.

failed on setup with "elastic_transport.ConnectionTimeout: Connection timeout caused by: ConnectionTimeout(Connection timeout caused by: ReadTimeoutError(HTTPConnectionPool(host='elasticsearch', port=9200): Read timed out. (read timeout=30)))"

We increased gradually the timeout from 10 to 30, but nothing.

Have anyone else experienced such a problem in a similar setup (GitHub, Docker-Compose, Elastisearch, etc.) ?

Do you have any idea on how we can find the real underlying reason for this ?

Any suggestions more than welcome :slight_smile:

Thanks !!

1 Like

We have the same issue as well. This is so annoying.

So, the problem was related to the available space in the machine that the GitHub runner is running. I am going to elaborate on how we found this, and afterwards how we fixed it.

Using an awesome tool called GitHub - tmate-io/tmate: Instant Terminal Sharing, and plugin this to our workflow, using the action: Debugging with tmate · Actions · GitHub Marketplace · GitHub, we managed to start our services using docker compose and we checked the logs. We saw the following:

which indicates clearly that there is a problem with the available space on the hard drive.

They way we fixed that is by using another action GitHub - ShubhamTatvamasi/free-disk-space-action, via which we managed to reclaim some space. By doing so, we managed to launch both of our services and the tests ran successfully.

Hope that is going to help in the future anyone is going to experience anything similar.