Elastic defend is not working


In the beginning , it works fine , BUT AFTER
I changed the IP of the machine(which also disconnected the internet) , It starts unhealthy with this issue .
I changed the yml file , and fleet server IP , but can not fix this issue.
Is elastic defend must have internet?


Please , any one can help me with this ? it says elastic is down, but when I go check 9200, it works fine

Does anyone have idea ? this is incredibly frustrating........

Hello,

Sounds like your changes hasn't reach Elastic Defend or do not work. Either way a starting point would be to launch from an elevated command prompt:

"C:\Program Files\Elastic\Endpoint\elastic-endpoint.exe" test output

This will test if Elasticsearch is reachable and will confirm where does it try to connect to (if the policy change has been applied or not)

In the beginning , I used that cmd , It showed

SSH remote key was not OK [SSL certificate problem: self signed certificate in certificate chain]"

then after I mess around with many solutions , I added the http_ca.crt to trusted , it begins to show
401 Unauthorized to the elasticsearch host

You're close to solving it.

If you're getting 401 that means the https works well. Check the Elasticsearch API key. Maybe it was revoked in the meantime?

i can see there is already an API key in this ymal file , but it is not working .
I can not modify this endpoint yaml file


Here is the log if it is useful

10:25:09.422
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][info] Logging.cpp:72 Logging directory cleaned up, current size: 62500934
10:25:13.992
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:15.992
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][warning] AgentContext.cpp:486 Endpoint is setting status to DEGRADED, reason: Unable to connect to output server
10:25:19.006
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:24.026
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:29.063
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][error] ElasticsearchClient.cpp:206 HTTP Status Code (401): {"error":{"root_cause":[{"type":"security_exception","reason":"unable to authenticate with provided credentials and anonymous access is not allowed for this request","additional_unsuccessful_credentials":"API key: unable to find apikey with id crWYEI8B1dqHVaRseWrV","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}}],"type":"security_exception","reason":"unable to authenticate with provided credentials and anonymous access is not allowed for this request","additional_unsuccessful_credentials":"API key: unable to find apikey with id crWYEI8B1dqHVaRseWrV","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}},"status":401}
10:25:29.063
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][info] BulkQueueConsumer.cpp:122 Will not attempt to connect to Elasticsearch for 56 more seconds due to 401 unauthorized connection response
10:25:29.063
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:34.082
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:36.069
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][warning] AgentContext.cpp:486 Endpoint is setting status to DEGRADED, reason: Unable to connect to output server
10:25:39.103
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:44.180
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:49.656
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:54.745
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:25:56.186
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][warning] AgentContext.cpp:486 Endpoint is setting status to DEGRADED, reason: Unable to connect to output server
10:25:59.945
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:26:05.108
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:26:09.426
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][info] Logging.cpp:72 Logging directory cleaned up, current size: 62507845
10:26:10.264
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down
10:26:10.264
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][notice] BulkQueueConsumer.cpp:193 Elasticsearch connection is down

I've never had to update the API key, so I'm also a little stuck here. I'll come back once I learn how to do it. Maybe in the meantime someone else will chime in.

First of all, you never edit the yaml, as even if you could, it will be soon overwritten by the stack.

Go to Fleet, find your policy, click on Settings, and check the assigned output

The output configurations are showing up on the Fleet

The API key shows up under stack management

However this page tells me the "Default" key used for my "Default" output is Managed so I'm not allowed to change it (at least not here).

I have only one output it is default , never changed it

Still output this stuff

PS C:\Program Files\Elastic\Endpoint> .\elastic-endpoint.exe test output
Testing output connections using config file: [C:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml]

Using proxy:

Elasticsearch server: https://192.168.71.130:9200
        Status: HTTP code 401: Unauthorized

Global artifact server: https://artifacts.security.elastic.co
        Status: Couldn't resolve host name [Could not resolve host: artifacts.security.elastic.co]

Fleet server: https://localhost:8221
        Status: Couldn't connect to server [Failed to connect to localhost port 8221 after 2636 ms: Couldn't connect to server]
        Help: Make sure the server address is correct and that hosts can connect to it

there's something odd, is it a self hosted stack or Elastic cloud? Elastic cloud would never configure Endpoints to use IP address...

Could you verify on the Fleet -> Settings -> Output that the "default" is really pointing to the Elasticsearch DB you're assuming your Endpoint should talk to?

Wait a second Fleet server: https://localhost ?? Are you sure the Elastic Agent is able to talk to the Fleet/stack? What is the output of "C:\Program Files\Elastic\Agent\elastic-agent.exe" status, and does it show the expected policy with inspect command?

I followed a video to set up these ,


And I suddenly found the agent in offline too , I think I messed up
but I have a snapshot which can back the initial state , which is the first post state with SSH remote key was not OK [SSL certificate problem: self signed certificate in certificate chain]"

10:59:18.008
elastic_agent
[elastic_agent][error] Unit state changed fleet-server-default-fleet-server-fleet_server-930b7b7a-1fea-43e2-89fe-004196af7437 (STARTING->FAILED): Error - could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: open \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: Access is denied.
10:59:20.309
elastic_agent
[elastic_agent][info] Unit state changed fleet-server-default-fleet-server-fleet_server-930b7b7a-1fea-43e2-89fe-004196af7437 (FAILED->STARTING): Starting
10:59:20.309
elastic_agent
[elastic_agent][info] Unit state changed fleet-server-default (FAILED->STARTING): Starting
10:59:20.422
elastic_agent
[elastic_agent][error] Unit state changed fleet-server-default-fleet-server-fleet_server-930b7b7a-1fea-43e2-89fe-004196af7437 (STARTING->FAILED): Error - could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: open \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: Access is denied.
10:59:20.422
elastic_agent
[elastic_agent][error] Unit state changed fleet-server-default (STARTING->FAILED): Error - could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: open \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: Access is denied.
10:59:22.413
elastic_agent
[elastic_agent][info] Unit state changed fleet-server-default-fleet-server-fleet_server-930b7b7a-1fea-43e2-89fe-004196af7437 (FAILED->STARTING): Starting
10:59:22.413
elastic_agent
[elastic_agent][info] Unit state changed fleet-server-default (FAILED->STARTING): Starting
10:59:22.453
elastic_agent
[elastic_agent][error] Unit state changed fleet-server-default-fleet-server-fleet_server-930b7b7a-1fea-43e2-89fe-004196af7437 (STARTING->FAILED): Error - could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: open \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: Access is denied.
10:59:22.453
elastic_agent
[elastic_agent][error] Unit state changed fleet-server-default (STARTING->FAILED): Error - could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: open \\.\pipe\UwGGXFL1il700DVAc6q-T-1Z9J1UjGMU.sock: Access is denied.

The output config you've just applied LGTM, however I can see that your fleet server is configured at IP whilst Endpoint was showing localhost. Perhaps after this change Agent re-enrollment is needed.

To be clear , I found the initial output before I messed around is

Dont know if it is useful

You've got fleet server running on port 8220 according to your earlier screenshot, who knows what service occupies 8221

It seems to me that you haven't fixed your certificates, you just have shut down the SSL errors by ssl.verification_mode: "none" that's fine to "get it going" but you know what you're doing.

My approach would be to setup the Fleet and Output in Kibana, then deploy Elastic Agent. This way the Agent should have the proper policy.

Maybe this will help you understanding your configs.

Elastic Agent is the sole coordinator service on the target machine. It receives a policy from the Fleet to know which Beats to run and how to configure them. Elastic Defend is a kind of special Beat, it runs as a standalone service instead of being a subprocess launched by the Agent service, however it is managed by Agent. The Agent has to have a working connection with Fleet to receive the policy and forward it to Elastic Defend. If Agent <-> Fleet connection is broken, Elastic Defend won't receive any policy update.

The policy is the whole config, including configuration of output (Elasticsearch in your case), but more importantly adding any alert exception, event filter, etc is delivered via policy too, thus the communication path Kibana -> Fleet -> Agent -> Endpoint must be working to have a good Elastic Defend setup.

You're using self-signed certificates. Please remember that when you're configuring your deployment using raw IP addresses, you have to re-create the certificates too so they include the new IP address in the CN field.

I guess once you've changed the IP addresses, the Agent lost communication with Fleet so it should be re-enrolled to get a working Fleet config.

I do not know if the Elasticsearch API key is anyhow tight to it's IP or DNS name, but maybe that's why my first thought was that it became invalid after your changes.

Yes I think I will do it again and I will make a new cert with

./elasticsearch-certutil ca 

Hope that will work , maybe reach you again if have some trouble .. thanks for your time anyway !

1 Like

Hi man , I suddenly know something , before anything , it was all ok , and then I cut off the internet , and this

"17:09:51.281
elastic_agent.endpoint_security
[elastic_agent.endpoint_security][error] MessageHelpers.cpp:313 CURL error: SSL peer certificate or SSH remote key was not OK [SSL certificate problem: self-signed certificate in certificate chain]"

issue happend . is the defualt cert must have internet connection ? I am a student still learning these stuff ..
I followed this video to set up, I can see in that video , he didnt mess around with certs

This is funny question, but no worries :wink:

I bet you'll get something like "host is unreachable" without internet, but I can see that you've setup everything on localhost. I think the problem is that you've setup services using the machine external IP which disappears when you cut off the internet, unless it's a static IP (not given by DHCP). Try to setup everything explicitly on localhost or 127.0.0.1.

Another issue which might happen here is how the servers bind to the network. TCP server might bind to "0.0.0.0" that way it'll listen for incoming connections from any address, but it can also choose to bind to only a specific address, such as 127.0.0.1 or [external IP]. Note also that the machine might have multiple physical adapters thus having multiple external IPs. I don't know how Elasticsearch and Fleet choose to interpret their configs with this regards, if they bind on all addresses or just the explicitly specified one.

I belive this is the main problem here .. I just wondering why the video does not have this problem ...alas