Endpoint 7.9 "Degraded and dashboards"

So after looking back at a few machines that did drop offline again today which did come online after several hours. Several agents are from the last host that went down for updates. But not all as I have 4 nodes they can connect to. Only some of the agents have just stopped. The only thing I can see is the 2020-09 updates where just applied to all of the machines offline but others have 2020-09 and they work...

I started looking at the last logs as they started dropping offline in kibana and then stopped sending logs altogether today. Some as of a few minutes ago. This was not expected...

image This is default config out of the box no changes to system yet. Each of the offline nodes had high memory usage.

Metricbeat bundled with Endpoint. After the Elastic-Agent and Elastic-Endpoint is stopped it still runs along with Filebeat. I have disabled the standalone Metricbeat on the endpoints in question for testing just to see prior.

One of the last logs in the the ingest manager:
"malware": {
"concerned_actions": [
"agent_connectivity",
"load_config",
"workflow",
"download_global_artifacts",
"download_user_artifacts",
"configure_malware",
"read_malware_config",
"load_malware_model",
"read_kernel_config",
"configure_kernel",
"detect_process_events",
"detect_file_write_events",
"connect_kernel",
"detect_file_open_events",
"detect_sync_image_load_events"
],
"status": "failure"
},
"streaming": {
"concerned_actions": [
"agent_connectivity",
"load_config",
"read_elasticsearch_config",
"configure_elasticsearch_connection",
"workflow"
],
"status": "success"
}
}
},
"status": "failure"

From the windows application log:
Faulting application name: elastic-endpoint.exe, version: 7.9.0.0, time stamp: 0x5f32bdd7
Faulting module name: elastic-endpoint.exe, version: 7.9.0.0, time stamp: 0x5f32bdd7
Exception code: 0xc0000005

From the local machine endpoint-xxx.log from the same machine the above logs and snip is from:
{"@timestamp":"2020-09-10T23:22:32.95811900Z","agent":{"id":"removed","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":1392,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:1392 Establishing GET connection to [https://node3:9200/_cluster/health]","process":{"pid":5496,"thread":{"id":2140}}}
{"@timestamp":"2020-09-10T23:22:32.95811900Z","agent":{"id":"71cfd898-0cf9-47c5-a97d-bb8f3f3b1f9a","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"notice","origin":{"file":{"line":65,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:65 Elasticsearch connection is down","process":{"pid":5496,"thread":{"id":2140}}}

If you want more detailed logs just tell me what you need. I will PM them underacted as I have a decent test bed to pull from. If you want a alpha genie pig for a beta I'll do that as well.

@pierhugues @Kevin_Logan

Agent version 7.9.1 fixed some of the disconnect issues. Not sure what changed "haven't looked" I still need to do the full restart on the cluster in a not safe order to replicate a failure to see it if fixed the reconnect after a machine is off for an extended period.

What would be really nice is to put a limit on is the memory that Filebeat/Metricbeat can use. I'm seeing it consume 8Gb Ram which is max we have on some machines. It's reasonable for it to take up 512Mb maybe 1Gb on a larger file but anything past is detrimental to the user experience.

It has forced a soft crash on a few lower end machines I have due to it consuming everything the machine had. Any chance of putting a cap or allowing a cap to be configured in the policy for a participial group?

For example some Windows 10 1st gen embedded devices we have only have 4Gb ram on them. That is very tight for 10 even on a good day. Having something chunk 1Gb would hurt those devices for a good long while as they normally have very low end CPU's.

After additional testing of 7.9.1 I have some more concerns. Default memory usage is really high but only initially. It does drop after a few depending on file count. There is no obvious easy spot to change the settings in the ingest manager. I'm only using ingest manager for all testing not going outside of it or modifying yml files directly for testing. I have not been able to test the snapshot of 7.10 yet so forgive me if this has already been addressed.

I'm still seeing degraded messages now it's exclusive to Endpoint.

"malware": {
"concerned_actions": [
"agent_connectivity",
"load_config",
"workflow",
"download_global_artifacts",
"download_user_artifacts",
"configure_malware",
"read_malware_config",
"load_malware_model",
"read_kernel_config",
"configure_kernel",
"detect_process_events",
"detect_file_write_events",
"connect_kernel",
"detect_file_open_events",
"detect_sync_image_load_events"
],
"status": "failure"

Essentially want Endpoint is doing 7.9.1 is nothing at all. I dropped 40 know malware variants on a machine not more then an hour ago knowing that it wouldn't catch any of them due to the failures indicated in the logs above.

Anything I can try to see if Endpoint will actually work or standby for the next update?

Hi @PublicName. Since you don't mind, can you PM me the Endpoint logs (c:\Program Files\Elastic\Endpoint\ state\log*), the Endpoint's config (c:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml), and the Endpoint's latest payload response from applying that config/policy (I mean the full "degraded message" you shared a snippet of). I'll look through them to see if I can determine what may be causing your failures.

As requested you have 3 PM's due to length exceeding the 13000chr limit. Unable to attach to the message due to it not being a jpg, jpeg, png or gif.

I hope you see something I missed. I'm at a loss. The logs are repeatable on workgroup machines directly off the Microsoft ISO files for home and pro versions. From 1809 to 2004 I end up with the same message on each. It appears the driver never fully loads. The same happens on 2012R2 up to 2019 server as well. The disconnect message appears on multi clusters some being on the same layer2 network and same switch.

I do see that the drive filter you are attempting to load is signed by Elastic and Microsoft. After looking a random sampling on my workstation and a spare laptop I check the drivers for vendors like AMD, Nvidia, Dell and the only signer is Microsoft for the hardware compatibility publisher or themselves off of a trusted root like symantec or digicert "same ca you use". Not sure if that matters as I'm grasping for straws. Really not worth me doing any debugging when you awesome folks are already well ahead of us users.

Thank you!

I looked through them and I see the issue with the Policy failure. If you go into the Security App's Administration tab and click on the "Configuration Status" for the failing host you should see a dialog pop up on the right side of the screen that lets you drill down into the policy and see the failure in a nice UI.

But, since you shared the payload document for Endpoint from Ingest Manager I'll describe how to interpret it. The relevant portion is the Endpoint.policy.applied.actions array. One of them contains a failure (download_user_artifacts), which means the reason your Endpoint is failing to download artifacts it needs from Kibana (since the only artifacts a 7.9 Endpoint uses are exceptionlists it's clear that is the artifact Endpoint cannot download).

The section you'd previously shared a snipped of was from Endpoint.policy.actions.configurations. The way to think of these two sections (actions and configurations) is that when Endpoint applies policy it does many "actions" (e.g. download user artifacts, connect to the kernel driver, etc) for the higher level "configurations" (prevent malware, collect process events, etc). The actions array lists the things Endpoint failed or succeeded in doing, the configurations portion maps those actions to the configurations they are relevant to. Hopefully that makes sense.

Can you look in the Endpoint logs to see why user artifacts are failing to download? The elastic-endpoint.yaml file contains information on the artifacts that are downloaded. If you search for the relative URL (/api/endpoint/artifacts/download/endpoint-exceptionlist-windows-v1) in Endpoint's logs you should hopefully see some log messages that point you to the issue. In this case since you've previously had issues with Kibana connections from Agent I suspect something similar is happening here.

I'm not sure why this failure would cause Endpoint to fail to detect the malware samples you tested. I'd be happy to work through that too but we should get your Endpoint in a good working state before diving into that.

Actively log for the agent? I don't happen to see security admin or configuration status.

Exception list is empty on all clusters as the option to add "save" exception is grayed out on each one for some reason. If I attempt to add an endpoint exemption nothing will populate. If I manually type I'm unable to save. I can add a rule exemption as a test to see if it will download and have success vs failure.

Results from URL. Considering the API is used it's a little harder to check. Using the elastic user ends with:
{"statusCode":401,"error":"Unauthorized","message":"Unauthorized"}

Going back 1 level to see if I would get a file list:
{"statusCode":404,"error":"Not Found","message":"Not Found"}

Well that would a good reason to fail...

Is Endpoint setup like Carbon Black or Cylance where it's only active on runtime? That would explain some of it. I did use the good old metaspolit as well on a unpatched box. Guess we should start with the little things first as you said.

I'll let you know as soon as I get the rule added and tested on a few machines. See if I end up with the same or different results.

@ferullo
Unable to add any exceptions as I'm unable to save so I wasn't able to test if just creating it would do the trick.

7.9.2 hasn't resolved the issue with the exemption list. It still has the random disconnects that where it will say Elasticsearch connection is down. To be honest it almost looks like a permissions issue with the API that gets generated as it's failing to read the cluster/health status.

Let me ask you something. Can you have Username/Password + API keys enabled on the same cluster? Seeing as I ran into this a few weeks back before the ingest manager and was to lazy to do much past try it a few times. I never did get both of them working at the same time.

Now we know ElasticSearch is fine with both as we get logs from the agents. Is Kibana fine with it?

Sorry it's been a few days without a reply.

Given that it seems like you're having connectivity issues, let's work through your networking. There are two connections we need to validate for Endpoint. For both, API key authentication takes precedence over username/password authentication if both are in Endpoint's config.

Given that you hit errors in the past it's best to start with a fresh Agent and Endpoint install if possible so there is less in the Endpoint logs to go through. It would be helpful to know which connection is not working and what errors you're seeing.

Note that in the example commands below some specifics, like the API keys and URLs, will of course be different for you. Also not all of the commands below are a part of a native Windows installation. Hopefully if the exact commands don't work for you you'll be able to figure out some variant of them that works on your computer; if not just ping back and we'll find a different command together.

Connection to Kibana
Endpoint connects to Kibana to download potentially large artifacts it needs to fully apply the policy. For example, for 7.9 this is how Endpoint downloads the Alert Exceptions to apply on macOS and Windows.

In Endpoint's config (c:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml) you should see a snippet that looks like this:

fleet:
  api:
    access_api_key: BASE64VALUE
    kibana:
      host: example.com
      protocol: https
inputs:
  - artifact_manifest:
      artifacts:
        endpoint-exceptionlist-windows-v1:
          relative_url: /api/endpoint/artifacts/download/endpoint-exceptionlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658

Based on that you can search Endpoint's logs for the relative_url to see what happened when Endpoint tried to download the artifact. On my machine these are the logs I see.

C:\WINDOWS\system32>grep endpoint-exceptionlist-windows-v1 "c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log"
{"@timestamp":"2020-09-29T21:26:10.70243100Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":2241,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:2241 Downloading artifact: endpoint-exceptionlist-windows-v1","process":{"pid":9832,"thread":{"id":2232}}}
{"@timestamp":"2020-09-29T21:26:10.70243100Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":1440,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:1440 Establishing GET connection to [https://example.com:443/api/endpoint/artifacts/download/endpoint-exceptionlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658]","process":{"pid":9832,"thread":{"id":2232}}}
{"@timestamp":"2020-09-29T21:26:10.32287000Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":497,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:497 Artifact endpoint-exceptionlist-windows-v1 successfully verified","process":{"pid":9832,"thread":{"id":2232}}}

Further, you can use Curl to manually try to download the same artifact. Make sure to pipe the value to something like xxd since the content downloaded isn't text.

C:\WINDOWS\system32>curl -H "Authorization: ApiKey BASE64VALUE" https://example.com:443/api/endpoint/artifacts/download/endpoint-exceptionlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658 | xxd
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    22  100    22    0     0     22      0  0:00:01 --:--:--  0:00:01    41
00000000: 789c ab56 4acd 2b29 ca4c 2d56 b28a 8ead  x..VJ.+).L-V....
00000010: 0500 2719 0529                           ..'..)

C:\WINDOWS\system32>

Connection to Elasticsearch
Endpoint connects to Elasticsearch to store data that it generates.

In Endpoint's config file you should see a snippet that looks like this:

output:
  elasticsearch:
    api_key: raw:value
    hosts:
      - https://example.com:443

Based on that you can search Endpoint's logs to see what happens when it checks to see if it can send to Elasticsearch. If after checking the cluster health it sends data to the _bulk API then it is able to send data.

C:\WINDOWS\system32>grep -A 1 "_cluster/health" "c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log"
{"@timestamp":"2020-09-29T21:26:16.38473600Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":1440,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:1440 Establishing GET connection to [https://example.com:443/_cluster/health]","process":{"pid":9832,"thread":{"id":9352}}}
{"@timestamp":"2020-09-29T21:26:16.45341500Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":1440,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:1440 Establishing POST connection to [https://example.com:443/_bulk]","process":{"pid":9832,"thread":{"id":9352}}}

C:\WINDOWS\system32>

You can also search for "documents to Elasticsearch" to see how many documents Endpoint is periodically sending.

C:\WINDOWS\system32>grep "documents to Elasticsearch" "c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log" | head -n 4
{"@timestamp":"2020-09-29T21:26:17.55295400Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":180,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:180 Sent 8 documents to Elasticsearch","process":{"pid":9832,"thread":{"id":9352}}}
{"@timestamp":"2020-09-29T21:28:11.49117600Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":180,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:180 Sent 1 documents to Elasticsearch","process":{"pid":9832,"thread":{"id":9352}}}
{"@timestamp":"2020-09-29T21:28:13.63456900Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":180,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:180 Sent 227 documents to Elasticsearch","process":{"pid":9832,"thread":{"id":9352}}}
{"@timestamp":"2020-09-29T21:30:11.2557800Z","agent":{"id":"4b707d92-f692-4d70-9251-fa99fa06435c","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":180,"name":"BulkQueueConsumer.cpp"}}},"message":"BulkQueueConsumer.cpp:180 Sent 1 documents to Elasticsearch","process":{"pid":9832,"thread":{"id":9352}}}

C:\WINDOWS\system32>

From the configuration file snippet you can also generate a Curl request to see what happens when you manually try to connect to Elasticsearch. Notice that before using Curl you must base 64 encode the api_key value.

C:\WINDOWS\system32>python3 -c "import base64; print(base64.b64encode('raw:value'.encode('utf-8')))"
b'cmF3OnZhbHVl'

C:\WINDOWS\system32>curl -H "Authorization: ApiKey cmF3OnZhbHVl" https://example.com:443/_cluster/health
{"cluster_name":"0e0111df93d141b8997a992f385d2aa8","status":"green","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":2,"active_primary_shards":42,"active_shards":84,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}
C:\WINDOWS\system32>

I will fire up a test VM straight of the ISO and see what I get as soon as I can get to it might be a few days. The errors are off of several machines try 50+ test machines none of them are cloned all are direct windows wim installs. I get valid data from metrics and from filebeat that is useable from all agents even the ones that says failed. It's not an error I can reproduce at will so it's hard to track down. What I don't get and several other people on the forums are not getting are endpoint malware events. They are never sent to the elastic even with the 7.9.2 agent. I haven't been able to test the snapshot version yet.

The failed connection in my use case lines up with the degraded messages that show up on the kibana fleet manager part.

For the Malware alert issue, have you tried testing with a version of Mimikatz? Endpoint detects malware when it is written or executed but not if it is just sitting on the filesystem.

Can you go to Security->Administration and make sure the Policy for your Endpoint is in a green/sucess state. If it isn't you can click on the status and a dialog will appear on the right showing what worked and didn't work. Please share what isn't working.

Assuming malware detection is working, can you see if running or copying Mimikatz on the C:\drive generates an alert?

Brand new Windows 10 LTSC machine not even patched. Agent version 7.9.2 new policy named "For_You_Ferullo" with 1 agent named TESTBOX and Endpoint enabled. Minikatz downloaded and of course Chrome hates it and flags it so you have to allow. Windows defender disabled just to avoid it steeping in.

"Artifacts.cpp:2298 Failed to download artifact endpoint-exceptionlist-windows-v1 - Failure in an external software component"

{"@timestamp":"2020-10-02T00:22:41.78407700Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":629,"name":"SyncKernelMessageManager.cpp"}}},"message":"SyncKernelMessageManager.cpp:629 Process ID 1608: [C:\mimikatz_trunk\x64\mimikatz.exe] is allowed due to message processing failure, error code -205","process":{"pid":8188,"thread":{"id":8548}}}
{"@timestamp":"2020-10-02T00:22:41.78407700Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":629,"name":"SyncKernelMessageManager.cpp"}}},"message":"SyncKernelMessageManager.cpp:629 Process ID 1608: [C:\mimikatz_trunk\x64\mimikatz.exe] is allowed due to message processing failure, error code -205","process":{"pid":8188,"thread":{"id":8548}}}
{"@timestamp":"2020-10-02T00:22:41.72155100Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\x64\mimikatz.exe]","process":{"pid":8188,"thread":{"id":9056}}}
{"@timestamp":"2020-10-02T00:22:41.72155100Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\x64\mimikatz.exe]","process":{"pid":8188,"thread":{"id":9056}}}
{"@timestamp":"2020-10-02T00:23:11.79709900Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":629,"name":"SyncKernelMessageManager.cpp"}}},"message":"SyncKernelMessageManager.cpp:629 Process ID 8616: [C:\mimikatz_trunk\x64\mimidrv.sys] is allowed due to message processing failure, error code -205","process":{"pid":8188,"thread":{"id":9056}}}
{"@timestamp":"2020-10-02T00:23:11.79709900Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"warning","origin":{"file":{"line":1047,"name":"Authenticode.cpp"}}},"message":"Authenticode.cpp:1047 WinVerifyTrust returned: 800b0101, errorExpired (C:\mimikatz_trunk\x64\mimidrv.sys)","process":{"pid":8188,"thread":{"id":4028}}}
{"@timestamp":"2020-10-02T00:23:11.95328800Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\x64\mimidrv.sys]","process":{"pid":8188,"thread":{"id":8548}}}
{"@timestamp":"2020-10-02T00:23:48.13958800Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":629,"name":"SyncKernelMessageManager.cpp"}}},"message":"SyncKernelMessageManager.cpp:629 Process ID 8616: [C:\mimikatz_trunk\Win32\mimikatz.exe] is allowed due to message processing failure, error code -205","process":{"pid":8188,"thread":{"id":9056}}}
{"@timestamp":"2020-10-02T00:23:48.13958800Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":629,"name":"SyncKernelMessageManager.cpp"}}},"message":"SyncKernelMessageManager.cpp:629 Process ID 8616: [C:\mimikatz_trunk\Win32\mimikatz.exe] is allowed due to message processing failure, error code -205","process":{"pid":8188,"thread":{"id":9056}}}
{"@timestamp":"2020-10-02T00:23:48.73626800Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\Win32\mimikatz.exe]","process":{"pid":8188,"thread":{"id":8548}}}
{"@timestamp":"2020-10-02T00:23:48.73626800Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\Win32\mimikatz.exe]","process":{"pid":8188,"thread":{"id":8548}}}

Free text search for mimikatz Kibana in the logs-* ends with 0 results which is already known as I'm not the only one missing the endpoint malware logs. I did a joke search for cute cat with no results.

0 alerts are triggered but that's expected as nothing for endpoint is sent. Filebeat and Metric beat logs are received with 0 issues. What I do find interesting is mimikatz process was killed and the exe deleted. Looking over the defender logs I did not see that it was the one that stopped it. So maybe 7.9.2 did. The only reference I have in any log in the lines above. Now to fix endpoint-exceptionlist-windows-v1 and the lack of any malware notices in Elastic.

It looks like Endpoint is in fact the one that prevented Mimikatz! In particular, this is the log that shows Endpoint prevented Mimikatz from running since Endpoint wouldn't send an alert unless it had prevented it.

{"@timestamp":"2020-10-02T00:22:41.72155100Z","agent":{"id":"5b1ac9c4-f401-4ad6-9586-6b7c8c124b05","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":746,"name":"FileScore.cpp"}}},"message":"FileScore.cpp:746 Sending alert for [C:\mimikatz_trunk\x64\mimikatz.exe]","process":{"pid":8188,"thread":{"id":9056}}}

Endpoint alerts are written to the index logs-endpoint.alerts-default. Default is the namespace, if you've changed this in Ingest Manager 7.9.0, 7.9.1, and 7.9.2 Endpoints won't recognize that change and will still send to the default namespace. A fix is in the works for that bug.

Do you see the alert in that index? If not, check Endpoint logs to see if it is able to communicate with Elasticsearch. Details how to do that are in an earlier comment in this thread.

If you do see the alert the issue is likely that you need to enable alert detection rules in the Security App. To do so go to "Security" -> "Detections" then click on "Manage detection rules". I recommend clicking "Load prebuild detection rules and timeline templates" but if you don't make sure the "Elastic Endpoint Security" rule is enabled. After doing that, try generating an alert again.

Hope this helps!

Sorry been a little delayed on testing. Been rather busy. I still haven't had the time to get another fresh machine with python on it follow your steps. I will end up doing it soon I want to make sure it's not the API key preventing anything.

I do not have an index logs-endpoint.alerts-default that is useable at this time. I do see the templet for it. To be honest I don't expect to see it yet with the other issue of the agent not talking.

I know you awesome dev's have an update in the works mostly around the endpoint not being able to communicate with TLS. All of my clusters run TLS on them even the stand along test node due to the SIEM not starting without it.

I have not changed anything with the index. I leave the defaults and use them when needed. You guys have way more time and skill to do this I'll defer to your wisdom on that part :slight_smile:

@ferullo Going to mark your response as the solution. Never could get the API key to work but with the SSL errors being known it's better to wait then chase my tail anymore.

I repeated the minikatz test on another machine and it failed. It was able to run without being stopped or the file being deleted.

Manually changing the elastic-endpoint config.yml and fleet.yml to hard path of the CA cert resolved the issues.

Not elegant nor scalable nor permanent as each time you issue an update it's overwritten but at least its repeatable.

I'm glad you found a solution, though I agree it's not a good lasting solution. Can you share the config snippet you modified (sanitizing anything that is personal information, of course)

elastic-endpoint.yaml:
fleet:
agent:
id: client_id_goes_here
api:
access_api_key: API_Key_goes_here
kibana:
host: kibana_server_name_goes_here:5601
protocol: https
ssl:
certificate_authorities:
- C:\Program Files\Elastic\Endpoint\ca.crt
renegotiation: never
verification_mode: full
timeout: 1m30s

fleet.yml
ssl:
verification_mode: full
certificate_authorities:
- C:\Program Files\Elastic\Endpoint\ca.crt
renegotiation: never

After testing on a few machines up to 10 currently. It's not consistent. Some work some don't. Agent version still 7.9.2 mind you. It's FAR better then before as I'm not seeing the disconnect notice or degrade nearly as often. It's now more consistent with successful messages.

After 4 hours in the endpoint logs on the client the disconnect messages start to reappear and degraded start start's to reappear so afraid it was short lived. This has only been tried on a single cluster but seeing as the same CA is used for both it wouldn't make any difference.

It would be awesome if we could use the built in cert stores on Windows and Linux for the CA lookup as in larger networks that would scale and allow for cert lookup and invalidation. Do believe this has already been requested and in the works on Github so just saying it to say it for the people that read the forums.

@ferullo

7.10 = Works. Thank you for your hard work!

The only comments I have is on the Fleet tab now "former injects manager" have a reminder to set the elasticsearch host as it's localhost by default and then in the YAML entry spot have a link to the syntax that's shows the supported options for output.

That's great! I'm glad to hear 7.10 was a smoother roll out for you.

I've relayed your comment internally about the localhost configuration workflow.