Waiting for a Fleet Server to connect… error

We are running into an issue with Fleet, where on our Agents page, the Fleet server is stuck in what appears to be a continuous reboot cycle.

We have terminated and redeployed the Fleet node and still run into this issue.

We've manually adjusted the Fleet and Elasticsearch URL's in the Fleet settings with no luck there either.

We have a private link, but also have a public link setup so Traffic Filtering shouldn't be causing the problem.

We are on v 7.13.2 of the stack (Cloud).

We are unable to get any Agents deployed until this is resolved.

Based on your description, I assume you are running on Elastic Cloud? And by Fleet node I assume you shut down the APM & Fleet node in the Cloud UI? That would have been my first "go to" fix to get it resolved.

Can you check the content of the "hosted elastic agent policy"? I've seen a case before where something happened during migration and it did not contain a specific id that is needed for the hosted fleet-server to pick it up. Could you share the content of this policy here (without credentials of course)?

Yes, we are on Elastic Cloud and yes we terminated the APM and Fleet node and re-enabled and it did not work.

As far as the Cloud agent policy, if that is what you are referring to, it's below.

I notice the inputs section is blank, and on my test cluster it is filled in with Fleet server details.

id: policy-elastic-agent-on-cloud
revision: 4
outputs:
  default:
    type: elasticsearch
    hosts:
      - 'https://{redacted}'
output_permissions:
  default:
    _fallback:
      cluster:
        - monitor
      indices:
        - names:
            - logs-*
            - metrics-*
            - traces-*
            - .logs-endpoint.diagnostic.collection-*
            - synthetics-*
          privileges:
            - auto_configure
            - create_doc
agent:
  monitoring:
    enabled: false
    logs: false
    metrics: false
inputs: []
fleet:
  hosts:
    - >-
      https://{redacted}

That is missing the fleet-server input. Which means the Fleet Server integration did not get installed into the Elastic Agent on Cloud policy as it should on first start up.

@nchaulet @ruflin ^

Thanks @blaker. We noticed that yesterday as well. We ended up manually installing the Fleet server integration and then had to terminate and re-enable APM&Fleet and it did end up connecting.

Thanks everyone for the help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.