Elastic Endpoint in a degraded state

I have over 200 agents installed on mostly Linux hosts and some windows which work as intended. However, all my macos workstations have a problem with the Elastic Defend integration. After running the command sudo /Library/Elastic/Endpoint/elastic-endpoint status I get the following:

- elastic-agent
  - status: (HEALTHY) Connected
- elastic-endpoint
  - status: (DEGRADED) Running
  - policy
    - actions:
      - configure_file_events: warning
        - No action taken
      - configure_network_events: warning
        - No action taken
      - configure_process_events: warning
        - No action taken
      - configure_response_actions: warning
        - No action taken
      - configure_yara_rule_loading: warning
        - No action taken
      - connect_kernel: warning
        - No action taken
      - detect_file_write_events: warning
        - No action taken
      - detect_network_events: warning
        - No action taken
      - detect_process_events: warning
        - No action taken
      - download_user_artifacts: failure
        - Failed to download user artifacts from fleet server [network error occurred], make sure the server URL is correct and that hosts can connect to it. Artifact endpoint-exceptionlist-macos-v1 is unavailable. Artifact endpoint-eventfilterlist-macos-v1 is unavailable. Artifact endpoint-trustlist-macos-v1 is unavailable
      - full_disk_access: warning
        - No action taken
      - workflow: failure
        - User artifacts failed to download, they are required to apply policy. Failed to execute all workflows: Invalid or unpermitted state encountered

How can I make the artifact endpoint-exceptionlist-macos-v1 available?

I also have Linux workstations on the same network which work as intended so it's not a network issue.

Thanks in advance!

The easiest thing that you can do that may fix this is modify the exception list that is applied to that policy. So go add/remove something from trusted apps etc that either applies to that specific policy or globally. This will trigger all of the user artifacts to be regenerated and tell the endpoints to download the new files instead.

Can you try that and see if it fixes the problem?

That doesn't seem to fix the problem. When I look a the logs I see the folowing:

[elastic_agent.endpoint_security][error] Artifacts.cpp:3467 Failed to download artifact endpoint-hostisolationexceptionlist-macos-v1 - cURL error

followed by:

[elastic_agent.endpoint_security][error] MessageHelpers.cpp:313 CURL error: Could not resolve hostname [Could not resolve host: MY_SERVER_ADDRESS]

can you ping and/or curl that server address from the host? Those errors are indicating a DNS resolution error

I tried curl the server on port 9200 and 8220. 9200 works but I get a 404 page not found on 8220. This seems weird to me because the enrollment uses this address and port and that works fine.

That seem unusual. There’s not really a reason that the hostname would resolve from that host on one program but not another. Just to confirm, you ran curl from the machine that was unhealthy? And the machine is still unhealthy with the test output command?

Yes I ran curl from the unhealthy machine and it stills shows up as degraded

Out of curiosity, how does the DNS to your server resolves? One or multiple addresses? IPv4, IPv6 or both?