HowTo reconfigure an ElasticAgent with an invalid Fleet-URL

My elastic agents are running in a kubernetes cluster (daemon set). For debugging purposes I configured a fleet server url pointing to a wiremock instance (pod only) running temporarily in the cluster too.

After debugging I killed the wiremock pod and all cluster agents went offline. Then I realized that I forgot to change the fleet url back to the original url. So I applied the correct url again.

Now the cluster agents have all an outdated policy revision and nothing happens. Even restarting the agent pods does not fix the issue because they still get on restart the old invalid fleet server url. It seems that the agents must perform a checkin call before getting the latest revision. Does anybody know how to fix this configuration issue? Thanks in advance.

P.S: Restarting the WireMock pod is not an option because it will get a different ip.

Hi,

You can update the Fleet Server URL in the Agent policy in Kibana. This should propagate the new URL to all agents assigned to that policy.

If the agents are still not connecting, you may need to manually update the Fleet Server URL in the agent's configuration file (elastic-agent.yml) on each host. After updating the file, restart the Elastic Agent service.

Regards.

Hi yago82,

Many thanks for your reply :slight_smile:

I updated the agent policy in kibana choosing the working fleet server url. This policy update creates a new revision for the agent policy. The agent overview in kibana now shows me that the agents running with an outdated policy. Due to the fact that the elasticsearch url is properly configured for these agents they send monitoring and log data. Their current status is offline and they do not fetch the new policy revision. The agent log contains the following line:

{"log.level":"error","@timestamp":"<timestamp>,“log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":197},"message":"Cannot checkin in with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"fail to checkin to fleet-server: all hosts failed: 1 error occurred:\n\t* requester 0/1 to host https://<wrong fleet server url> errored: Post \"https://<wrong fleet server url>/api/fleet/agents/<agent id>/checkin?\": dial tcp 1<wrong fleet server url>: connect: no route to host\n\n"},"request_duration_ns":3079800091,"failed_checkins":806,"retry_after_ns":467875157778,"ecs.version":"1.6.0"}

I found out that I’am able to restart the elastic agent in the container via the command „elastic-agent restart“ without loosing the pod. Yeaah :slight_smile:

But it seems that there is no configuration stored in the container including the fleet server url: #> find /usr/share/elastic-agent -name * -type f | xargs grep "“ /dev/null | more

Only the log files are containing the fleet server url. Running the command „elastic-agent status“ shows me:

┌─ fleet
│  └─ status: (FAILED) fail to checkin to fleet-server: all hosts failed: 1 error occurred:
│         * requester 0/1 to host https://<wrong fleet server url>/ errored: Post "https://<wrong fleet server url>/api/fleet/agents/<agent id>/checkin?": dial tcp <wrong fleet server url>: connect: no route to host
│
│
└─ elastic-agent
   └─ status: (HEALTHY) Running  

The command „elastic-agent inspect“ shows me:

fleet:
  access_api_key: <api key>
  agent:
    id: <agent id>
  enabled: true
  host: localhost:5601
  hosts:
  - https://<wrong fleet server url>
  protocol: http
  ssl:
    renegotiation: never
    verification_mode: none
  timeout: 10m0s

So I tried to follow your advice putting the fleet server url into the file /usr/share/elastic-agent/elastic-agent.yml. But I could not figure out the correct configuration from the documentation. Can you provide me a correct configuration snippet? Maybe just as in my snippet above? Like

fleet:
  hosts:
  - https://<correct fleet server url>

Some further information regarding my deployment. As explained on Run Elastic Agent on Kubernetes managed by Fleet | Fleet and Elastic Agent Guide [8.11] | Elastic I used the file elastic-agent-managed-kubernetes.yaml to setup my daemon set. The following environment variables are set:

env:
  - name: FLEET_ENROLL
    value: "1"
  - name: FLEET_INSECURE
    value: "true"
  - name: FLEET_URL
    value: "https://<correct fleet server url>"
  - name: FLEET_ENROLLMENT_TOKEN
    value: "<enrollment token agent policy>“
  - name: KIBANA_HOST
    value: "http://kibana:5601"
  - name: KIBANA_FLEET_USERNAME
    value: "elastic"
  - name: KIBANA_FLEET_PASSWORD
    value: "changeme"
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: ELASTIC_NETINFO
    value: "false"

It’s currently very frustrating. Hopefully someone has an idea how to fix it. Thanks in advance.

Hi,

Probably to resolve this issue, you can try to delete the Elastic Agent pods in your Kubernetes cluster. This will force Kubernetes to create new pods with the updated environment variables.

Regards

Yeaah, that was my first guess too. But unfortunately they get somehow the same configuration as before after recreation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.