Heartbeat Browser Monitor Zip URL: not a valid zip file

Hi,

I'm trying to use the Heartbeat Browser monitor with the Zip URL source, but am running into a weird issue.

Has anyone gotten Zip URL to work with GitLab zip urls?

I get the following error when Heartbeat tries to use the file:

{
   "level":"debug",
   "timestamp":"2021-12-16T20:30:00.233Z",
   "logger":"processors",
   "caller":"processing/processors.go:203",
   "message":"<see below>"
}
Publish event: {
    "@timestamp": "0001-01-01T00:00:00.000Z",
    "@metadata": {
        "beat": "heartbeat",
        "type": "_doc",
        "version": "7.16.1"
    },
    "monitor": {
        "type": "browser",
        "timespan": {
            "lt": "2021-12-16T20:35:00.000Z",
            "gte": "2021-12-16T20:30:00.230Z"
        },
        "status": "down",
        "id": "app-zipurl-test",
        "name": "ZIPURL Test"
    },
    "error": {
        "message": "could not fetch for suite job: could not read file /tmp/elastic-synthetics-zip-2506258212 as zip: zip: not a valid zip file",
        "type": "io"
    },
    "event": {
        "dataset": "browser"
    },
    "observer": {
        "hostname": "heartbeat-synthetics-test-beat-heartbeat-5c8966c5f4-9qj62",
        "ip": ["10.42.2.176", "fe80::3469:8cff:fe1d:43d4"],
        "mac": ["36:69:8c:1d:43:d4"]
    },
    "ecs": {
        "version": "1.12.0"
    },
    "agent": {
        "name": "heartbeat-synthetics-test-beat-heartbeat-5c8966c5f4-9qj62",
        "type": "heartbeat",
        "version": "7.16.1",
        "hostname": "heartbeat-synthetics-test-beat-heartbeat-5c8966c5f4-9qj62",
        "ephemeral_id": "b5fae110-3650-4047-a911-5d21a313ce3e",
        "id": "79fb9dfd-1058-4cf1-8ae0-86853da70cca"
    }
}

The error:

could not fetch for suite job: could not read file /tmp/elastic-synthetics-zip-2506258212 as zip: zip: not a valid zip file

I am using GitLab to host the repo, and have confirmed that at least GitLab is returning a 200 for the request:

192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "HEAD /synthetics/app/-/archive/main/app-main.zip HTTP/1.1" 302 0 "" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "HEAD /users/sign_in HTTP/1.1" 200 0 "https://internal.domain.com/synthetics/app/-/archive/main/app-main.zip" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "GET /synthetics/app/-/archive/main/app-main.zip HTTP/1.1" 302 111 "" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "GET /users/sign_in HTTP/1.1" 200 9654 "https://internal.domain.com/synthetics/app/-/archive/main/app-main.zip" "Go-http-client/1.1" 3.22

Here is the Heartbeat config:

heartbeat:
    monitors:
    - id: app-zipurl-test
      name: ZIPURL Test
      params:
        password: ${MONITOR_PASSWORD}
        url: https://internal.domain.com/
        username: ${MONITOR_USERNAME}
      schedule: 0 */5 * * * ? *
      source:
        zip_url:
          folder: synthetic-tests
          password: ${APP_GIT_PASSWORD}
          ssl:
            verification_mode: none
          url: https://internal.domain.com/synthetics/app/-/archive/main/app-main.zip
          username: ${APP_GIT_USERNAME}
      type: browser

Project Structure:

GitLab/ (GitLab base)
  synthetics/ (GitLab Group)
    app/ (git project)
      main/ (git branch)
        synthetics-tests/ (folder in which .ts files exist)

Effectively the same structure as https://github.com/elastic/synthetics-demo/tree/main/todos, but replace elastic with synthetics

Looking at the sample requests from your post (duplicated below)

192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "HEAD /synthetics/app/-/archive/main/app-main.zip HTTP/1.1" 302 0 "" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "HEAD /users/sign_in HTTP/1.1" 200 0 "https://internal.domain.com/synthetics/app/-/archive/main/app-main.zip" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "GET /synthetics/app/-/archive/main/app-main.zip HTTP/1.1" 302 111 "" "Go-http-client/1.1" -
192.168.247.5 - app [16/Dec/2021:15:35:00 -0500] "GET /users/sign_in HTTP/1.1" 200 9654 "https://internal.domain.com/synthetics/app/-/archive/main/app-main.zip" "Go-http-client/1.1" 3.22

It looks like there's an additional sign-in step being redirected to. Have you tried pulling the zip file via curl? The user/password is for http/basic. If you have some other sort of SSO that won't work currently. Is there a way you can use HTTP basic for auth?

Thanks for pointing that out @Andrew_Cholakian1

So, it looks like GitLab doesn't support downloads via basic auth (API Docs | GitLab) over their API. I was able to find a workaround using their API, that I tested with curl:

curl -X GET "https://internal.domain.com/api/v4/projects/synthetics%2Fapp/repository/archive.zip?sha=main&private_token=<token>" --output archive.zip

Which worked.

But this seems to be causing issues of its own in Heartbeat:

Publish event: {
    "@timestamp": "0001-01-01T00:00:00.000Z",
    "@metadata": {
        "beat": "heartbeat",
        "type": "_doc",
        "version": "7.16.1"
    },
    "monitor": {
        "status": "down",
        "id": "app-zipurl-test",
        "name": "ZIPURL Test",
        "type": "browser",
        "timespan": {
            "gte": "2021-12-17T20:50:00.086Z",
            "lt": "2021-12-17T20:55:00.000Z"
        }
    },
    "error": {
        "type": "io",
        "message": "could not fetch for suite job: could not check if zip source changed for https://internal.domain.com/api/v4/projects/synthetics%2Fapp/repository/archive.zip?sha=main&private_token=<token>: No ETag header in zip file response. Heartbeat requires an etag to efficiently cache downloaded code"
    },
    "event": {
        "dataset": "browser"
    },
    "observer": {
        "hostname": "heartbeat-synthetics-test-beat-heartbeat-7c8d944798-d645k",
        "ip": ["10.42.2.177", "fe80::9049:d4ff:fe28:656a"],
        "mac": ["92:49:d4:28:65:6a"]
    },
    "ecs": {
        "version": "1.12.0"
    },
    "agent": {
        "hostname": "heartbeat-synthetics-test-beat-heartbeat-7c8d944798-d645k",
        "ephemeral_id": "712a0793-b5ac-49a1-a4b8-ad4ed344f6bd",
        "id": "d76570be-63ba-44a1-8c41-28cdb1926ef6",
        "name": "heartbeat-synthetics-test-beat-heartbeat-7c8d944798-d645k",
        "type": "heartbeat",
        "version": "7.16.1"
    }
}
could not fetch for suite job: could not check if zip source changed for https://internal.domain.com/api/v4/projects/synthetics%2Fapp/repository/archive.zip?sha=main&private_token=<token>: No ETag header in zip file response. Heartbeat requires an etag to efficiently cache downloaded code

Not sure if there is a way to work around this issue. Or if GitLab just can't be used currently, but it at least is able to now try and download the file. (Though not 100% ideal, as it is recommended to put private_token value in a header, but I can't find a spot to add headers in the config)

What's is weird though is if I run the same curl command with -v to get headers, I do see an etag header in the response:

curl -v -X GET "https://internal.domain.com/api/v4/projects/synthetics%2Fapp/repository/archive.zip?sha=main&private_token=<token>" --output archive.zip
* TCP_NODELAY set
* Connected to internal.domain.com (192.168.11.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [1605 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
<snipped>
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55a439012e60)
} [5 bytes data]
> GET /api/v4/projects/synthetics%2Fapp/repository/archive.zip?sha=main&private_token=<token>
> Host: internal.domain.com
> User-Agent: curl/7.66.0
> Accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200
< server: nginx
< date: Fri, 17 Dec 2021 21:40:02 GMT
< content-type: application/zip
< content-length: 3027
< accept-ranges: bytes
< cache-control: max-age=0, private, must-revalidate
< content-disposition: attachment; filename="app-main-57efb02223de445ac868dc8e9611c7e2931330d6.zip"
< content-transfer-encoding: binary
< etag: W/"12ae32cb1ec02d01eda3581b127c1fee"
< vary: Origin
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< x-request-id: 01FQ55CVY5KC8Q7KXFMEXVEDC3
< x-runtime: 0.045815
< strict-transport-security: max-age=63072000
< referrer-policy: strict-origin-when-cross-origin
<
{ [3027 bytes data]
100  3027  100  3027    0     0  30575      0 --:--:-- --:--:-- --:--:-- 30575
* Connection #0 to host internal.domain.com left intact

What we'd like to do, eventually, is setup specific download options for github and gitlab that generate the right URL and don't require you to figure this all out. Thanks for bearing with us!

Yeah, it does seem that gitlab is a bit dicey in terms of auth. We may have to implement a custom solution for it :confused: . The best workaround is probably to have a CI task publish a zip artificact somewhere that does support basic auth for now.

Not sure if you're a go developer, but we'd gladly take a patch to provide better gitlab support! The etag is kind of important in that, without it, we have to download the zip every time, and probably checksum etc. to see if it's different.

I suppose we could also have some logic around last changed, but etag is so much more proper.

So, with etag, do you know where it pulls it from? From GitLab I see it as a response header, but from testing with GitHub, I don't see an etag response header, so I'm assuming it's pulling it from some other place?

Try curl -I -L https://github.com/elastic/synthetics-demo/archive/refs/heads/main.zip, looking at the final response (after the 302 redirect) there is an etag there. So, we only use the etag on the final request in a redirect situation (which is the only correct way to use it).

Thanks for the additional information. Looking into this a bit further, it looks like this might require a new release of GitLab. feat: implement ETag support for zip serving (!588) · Merge requests · GitLab.org / gitlab-pages · GitLab was merged ~1 day ago, and if I understand it correctly, should include the etag in the initial request.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.