Agent stuck on "Updating" when enrolling

Hi everybody,

After updating our stack to version 7.13.2 we have been experiencing problems when trying to enroll agents to Fleet.

Since fleet-server was introduced we added it to our stack and behind our reverse proxy (Caddy server) in order to access it with fleet.ourdomain.com and let Caddy handle the TLS.

When we try to enroll a new agent (on a Windows machine) with the following command :
.\elastic-agent.exe install -f --url=https://fleet.ourdomain.com:443 --enrollment-token=TOKEN every seems to be good at first, command line says agent has enrolled, it appears in Kibana but it stays in "updating" state forever.

When looking into the logs provided by the agent we get the following:

{"log.level":"error","@timestamp":"2021-07-03T11:26:59.026+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: status code: 400, fleet-server returned an error: BadRequest","ecs.version":"1.6.0"}

After trying out several solutions we found the following one:

Instead of using the install command when used the enroll command:

.\elastic-agent.exe enroll -f --url=https://fleet.ourdomain.com:443 --enrollment-token=TOKEN

So far we get exactly the same result. Agent enrolls by stays on "Updating". So next we actually install the agent by simply calling .\elastic-agent install however when the question about fleet comes, we answer that we don't want to enroll it as we did so previously.

After following those two steps agent show up correctly as "Healthy" as work correctly.

Why isn't the "automatic" process not working correctly. Are we doing something wrong here ?

Thanks for your help !

Did you try to sniff decrypted traffic? I wonder if it isn't a matter of bad "user-agent" or missing HTTP headers.

Do you know a way to capture the ssl keys ? The problem happens on windows machines and using the SSLKEYLOGFILE env variable doesn't seem to work. We see the packets in wireshark but can't decrypt them for the moment.
Thanks

I was thinking rather about sniffing traffic behind Caddy.

Indeed caddy show an error 400 as soon as we try to enroll an agent:

{
  "level": "error",
  "ts": 1625570358.3650916,
  "logger": "http.log.access.log2",
  "msg": "handled request",
  "request": {
    "remote_addr": "XX.XX.XX.XX:35998",
    "proto": "HTTP/1.1",
    "method": "POST",
    "host": "fleet.ourdomain.com:443",
    "uri": "/api/fleet/agents/1e848378-f9c3-4147-bb0a-2219d0edb697/checkin?",
    "headers": {
      "Content-Type": [
        "application/json"
      ],
      "Kbn-Xsrf": [
        "1"
      ],
      "Accept-Encoding": [
        "gzip"
      ],
      "User-Agent": [
        "Elastic Agent v7.13.2"
      ],
      "Content-Length": [
        "871"
      ],
      "Accept": [
        "application/json"
      ],
      "Authorization": [
        "ApiKey TW1sYmUzb0J4MzRKbVFGRzZndV86ZFlVcVkhgljkhjnxcjByR3VPSjk2dw=="
      ]
    },
    "tls": {
      "resumed": false,
      "version": 772,
      "cipher_suite": 4865,
      "proto": "",
      "proto_mutual": true,
      "server_name": "fleet.ourdomain.com"
    }
  },
  "common_log": "192.168.176.1 - - [06/Jul/2021:11:19:18 +0000] \"POST /api/fleet/agents/1e848378-f9c3-4147-bb0a-2219d0edb697/checkin? HTTP/1.1\" 400 52",
  "duration": 0.002426286,
  "size": 52,
  "status": 400,
  "resp_headers": {
    "X-Content-Type-Options": [
      "nosniff"
    ],
    "Date": [
      "Tue, 06 Jul 2021 11:19:18 GMT"
    ],
    "Content-Length": [
      "52"
    ],
    "Server": [
      "Caddy"
    ],
    "Content-Type": [
      "application/json; charset=utf-8"
    ]
  }
}

Weird thing is that the id (1e848378...) does not match the id in the fleet.yaml file created on the agent side. In fact it does not match any agents.

Thanks for the help