Problems with enrichments

Morning,
First time i troubleshoot filebeat.
But I did search for pipeline and errors and I think message are important field. I got many lines with following:
awk '/pipeline/ && /error/ {print}' filebeat_output.txt | jq . | grep message
"message": "Non-zero metrics in the last 30s",
"message": "Non-zero metrics in the last 30s",
"message": "Failed to publish event: failed to compute fingerprint: failed to find field [device_timestamp] in event: key not found",
"message": "Non-zero metrics in the last 30s",
"message": "Non-zero metrics in the last 30s",
"message": "Non-zero metrics in the last 30s",

Here is a log event:

{
  "_index": ".ds-carbon_black_observations-2025.02.08-2025.02.07-000001",
  "_id": "c6eb5534b0dc39b25585d6604cfaacd9de47a1bbe6da21167fdb73cfb7dbe754",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2025-02-08T06:55:05.640Z",
    "device_policy_id": 245339,
    "host": {
      "name": "x.arc.local"
    },
    "netconn_count": 0,
    "observation_id": "AAF299E7-E5D6-11EF-84A5-30E1716D2521:aaf299e6-e5d6-11ef-84a5-30e1716d2521",
    "attack_tactic": "TA0005",
    "childproc_count": 0,
    "ecs": {
      "version": "8.0.0"
    },
    "agent": {
      "name": "x.arc.local",
      "type": "filebeat",
      "version": "8.17.0",
      "ephemeral_id": "28ff7bf3-90dd-4810-852d-caf781e42c38",
      "id": "25dd37f4-faea-4548-b443-ac5c7af95308"
    },
    "log": {
      "offset": 3277,
      "file": {
        "path": "/home/arc/cb.json"
      }
    },
    "ingress_time": 1738996775391,
    "enriched_event_type": [
      "CREATE_PROCESS"
    ],
    "regmod_count": 0,
    "parent_guid": "xxxxx-01d0d9b6-000009e0-00000000-1db6837152fcbc5",
    "rule_id": "2DEED2A4-0115-4AF7-B6E2-FDCD30F5F7E5",
    "attack_technique": "T1027.010",
    "device_id": 30464438,
    "org_id": "xxxxx",
    "parent_pid": 2528,
    "event_id": "AAF299E7-E5D6-11EF-84A5-30E1716D2521",
    "scriptload_count": 0,
    "process_guid": "xxxxx-01d0d9b6-00003850-00000000-1db79f42aaf3f63",
    "device_group_id": 0,
    "modload_count": 0,
    "process_name": "c:\\windows\\system32\\cmd.exe",
    "input": {
      "type": "log"
    },
    "crossproc_count": 0,
    "process_hash": [
      "f4f684066175b77e0c3a000549d2922c",
      "935c1861df1f4018d698e8b65abfa02d7e9037d8f68ca3c2065b6ca165d44ad2"
    ],
    "backend_timestamp": "2025-02-08T06:40:24.603Z",
    "filemod_count": 0,
    "event_description": "The application c:\\windows\\system32\\cmd.exe launched a process using character encoding command switches.",
    "process_pid": [
      14416
    ],
    "event": {
      "dataset": "carbon_black.observations"
    },
    "device_timestamp": "2025-02-08T06:39:13.297Z",
    "event_type": "childproc",
    "observation_description": "The application c:\\windows\\system32\\cmd.exe launched a process using character encoding command switches.",
    "process_username": [
      "NT AUTHORITY\\SYSTEM"
    ],
    "device_name": "teknik\\veeam03",
    "observation_type": "INDICATOR_OF_ATTACK"
  },
  "fields": {
    "childproc_count": [
      0
    ],
    "parent_guid": [
      "xxxxx-01d0d9b6-000009e0-00000000-1db6837152fcbc5"
    ],
    "observation_description": [
      "The application c:\\windows\\system32\\cmd.exe launched a process using character encoding command switches."
    ],
    "event_description": [
      "The application c:\\windows\\system32\\cmd.exe launched a process using character encoding command switches."
    ],
    "ingress_time": [
      1738996775391
    ],
    "enriched_event_type": [
      "CREATE_PROCESS"
    ],
    "netconn_count": [
      0
    ],
    "agent.type": [
      "filebeat"
    ],
    "device_group_id": [
      0
    ],
    "device_name": [
      "teknik\\veeam03"
    ],
    "event_type": [
      "childproc"
    ],
    "process_guid": [
      "xxxxx-01d0d9b6-00003850-00000000-1db79f42aaf3f63"
    ],
    "process_hash": [
      "f4f684066175b77e0c3a000549d2922c",
      "935c1861df1f4018d698e8b65abfa02d7e9037d8f68ca3c2065b6ca165d44ad2"
    ],
    "device_timestamp": [
      "2025-02-08T06:39:13.297Z"
    ],
    "process_name": [
      "c:\\windows\\system32\\cmd.exe"
    ],
    "parent_pid": [
      2528
    ],
    "agent.name": [
      "x.arc.local"
    ],
    "attack_tactic": [
      "TA0005"
    ],
    "host.name": [
      "x.arc.local"
    ],
    "process_username": [
      "NT AUTHORITY\\SYSTEM"
    ],
    "device_policy_id": [
      245339
    ],
    "attack_technique": [
      "T1027.010"
    ],
    "device_id": [
      30464438
    ],
    "regmod_count": [
      0
    ],
    "crossproc_count": [
      0
    ],
    "filemod_count": [
      0
    ],
    "scriptload_count": [
      0
    ],
    "input.type": [
      "log"
    ],
    "backend_timestamp": [
      "2025-02-08T06:40:24.603Z"
    ],
    "log.offset": [
      3277
    ],
    "modload_count": [
      0
    ],
    "agent.hostname": [
      "x.arc.local"
    ],
    "process_pid": [
      14416
    ],
    "observation_type": [
      "INDICATOR_OF_ATTACK"
    ],
    "rule_id": [
      "2DEED2A4-0115-4AF7-B6E2-FDCD30F5F7E5"
    ],
    "@timestamp": [
      "2025-02-08T06:55:05.640Z"
    ],
    "event_id": [
      "AAF299E7-E5D6-11EF-84A5-30E1716D2521"
    ],
    "agent.id": [
      "25dd37f4-faea-4548-b443-ac5c7af95308"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "observation_id": [
      "AAF299E7-E5D6-11EF-84A5-30E1716D2521:aaf299e6-e5d6-11ef-84a5-30e1716d2521"
    ],
    "org_id": [
      "xxxxx"
    ],
    "log.file.path": [
      "/home/arc/cb.json"
    ],
    "agent.ephemeral_id": [
      "28ff7bf3-90dd-4810-852d-caf781e42c38"
    ],
    "agent.version": [
      "8.17.0"
    ],
    "event.dataset": [
      "carbon_black.observations"
    ]
  }
}

So these messages are completely normal they actually just show stats.
"message": "Non-zero metrics in the last 30s",...

If you look at the event you publish... first thing I notice is that the field added with this pipeline_name is not in the event.

    {
      "set": {
        "field": "pipeline_name",
        "value": "mitre_attack_pipeline"
      }
    },

That means

  1. your pipeline is not being called... assuming you left that processor in... did you?

  2. You have created a custom mapping / template that is not allowing the fields to be added

I also noticed that your backing index name is very odd and is not standard

".ds-carbon_black_observations-2025.02.08-2025.02.07-000001",

So this leads me way back to the very beginning ...

Did you set up some custom Datastream and Template?

Are you following some 3rd party Article?

I am not sure what you are trying to accomplish?

Why are you naming the indices the way you are?

Are you using any of the filebeat modules?

It seems you are new with filebeat but trying to make changes that you don't quite understand the impact of...

Share your entire filebeat.yml exactly how you ran it... and we will try a couple experiments....

I think we can accomplish something better / proper another way.. please share you entire filebeat as you last ran it.

So I rewrote your filebeat.yml to a more best practice / modern approach ...

Please try this, and then we will work on the ingest pipeline (Check for typos)

This will route the data into the new logs-* data stream framework.
Get this working then we will get the ingest pipeline working

This will put your events into the following data streams

logs-carbon_black.observation-default and
logs-carbon_black.vulnerabilities-default

Which is the 8.X way of doing this.

NOTE: you are using deprecated log input you should be using the new filestream input but lets get this working first.

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.ilm.enabled: false
setup.template.enabled: false

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /home/arc/cb.json
  json.keys_under_root: true
  json.overwrite_keys: true
  processors:
    - fingerprint:
        fields: ["event_id", "device_timestamp"]
        target_field: "@metadata._id"
  fields_under_root: true
  fields:
    data_stream.type: logs
    data_stream.dataset: carbon_black.observations
    data_stream.namespace: default
    event.dataset: carbon_black.observations

- type: log
  enabled: true
  paths:
    - /home/arc/vuln.json
  json.keys_under_root: true
  json.overwrite_keys: true
  processors:
    - fingerprint:
        fields: ["created_at", "cve_id"]
        target_field: "@metadata._id"
  fields_under_root: true
  fields:
    data_stream.type: logs
    data_stream.dataset: carbon_black.vulnerabilities
    data_stream.namespace: default
    event.dataset: carbon_black.vulnerabilities       

output.elasticsearch:
  hosts: ["https://x:9200", "https://x:9200", "https://x:9200"]
  protocol: "https"
  ssl.verification_mode: "none"
  username: "elastic"
  password: "xxxx"
  index: "%{[data_stream.type]}-%{[data_stream.dataset]}-%{[data_stream.namespace]}"

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0640

Also not sure why you are setting your own _id this is usually an expert level setting... Who / What is telling you to do that?

Hey Stephen,
I just did start your config.
sudo filebeat test config
Config OK

Lets wait for new logs and see :slight_smile:

Since I don't want duplicates of log entries I have that solution to make it uniqe.

Now no observations logs coming in to the Elastic. :confused:
I start it in the background and see if any errors.

should it come to this right?
carbon_black_observations-2025.02.09

not many errors i see only one in the stdout file.

@arcsons Apologies I had a small error in the configuration above I fixed it but now I have a new configuration that is better and tested.

Here is a new configuration I have tested, and it works...

I converted it to the filestream as logs is deprecated..
It now uses the following data streams.

logs-carbon_black.observation-default and
logs-carbon_black.vulnerabilities-default

You should be able to see these logs in discover with the ''logs-* data view or in Dev Tools
GET logs-carbon_black.observation-default/_search

Get this working then we will get the enrich working... again apologies.... We'll get the enriched working. That should be pretty straightforward after this works.

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.ilm.enabled: false
setup.template.enabled: false

filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - /home/arc/cb.json
  parsers:
    - ndjson:
        target: ""
        message_key: message 
        overwrite_keys: true
  processors:
    - fingerprint:
        fields: ["event_id", "device_timestamp"]
        target_field: "@metadata._id"
  fields_under_root: true    
  fields:
    data_stream.type: logs
    data_stream.dataset: carbon_black.observations
    data_stream.namespace: default
    event.dataset: carbon_black.observations

- type: filestream
  enabled: true
  paths:
    - /home/arc/vuln.json
  parsers:
    - ndjson:
        target: ""
        message_key: message 
        overwrite_keys: true
  processors:
    - fingerprint:
        fields: ["created_at", "cve_id"]
        target_field: "@metadata._id"
  fields_under_root: true      
  fields:
    data_stream.type: logs
    data_stream.dataset: carbon_black.vulnerabilities
    data_stream.namespace: default
    event.dataset: carbon_black.vulnerabilities       

output.elasticsearch:
  hosts: ["https://x:9200", "https://x:9200", "https://x:9200"]
  protocol: "https"
  ssl.verification_mode: "none"
  username: "elastic"
  password: "xxxx"
  index: "%{[data_stream.type]}-%{[data_stream.dataset]}-%{[data_stream.namespace]}"

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0640

Here is the results using your sample above

GET logs-carbon_black.observations-default/_search

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-logs-carbon_black.observations-default-2025.02.09-000001",
        "_id": "c6eb5534b0dc39b25585d6604cfaacd9de47a1bbe6da21167fdb73cfb7dbe754",
        "_score": 1,
        "_source": {
          "childproc_count": 0,
          "agent": {
            "name": "hyperion",
            "id": "f4840dab-b33d-4d35-82cb-1c6ae0bde624",
            "ephemeral_id": "a6bd2758-9f68-4adf-ba31-2e4036a60c97",
            "type": "filebeat",
            "version": "8.16.1"
          },
          "parent_guid": "xxxxx-01d0d9b6-000009e0-00000000-1db6837152fcbc5",
          "observation_description": """The application c:\windows\system32\cmd.exe launched a process using character encoding command switches.""",
          "log": {
            "file": {
              "inode": "135105789",
              "path": "/Users/sbrown/workspace/sample-data/discuss/carbonblack/carbonblackobs.ndjson",
              "device_id": "16777222"
            },
            "offset": 0
          },
          "event_description": """The application c:\windows\system32\cmd.exe launched a process using character encoding command switches.""",
          "ingress_time": 1738996775391,
          "enriched_event_type": [
            "CREATE_PROCESS"
          ],
          "netconn_count": 0,
          "device_name": """teknik\veeam03""",
          "device_group_id": 0,
          "event_type": "childproc",
          "process_guid": "xxxxx-01d0d9b6-00003850-00000000-1db79f42aaf3f63",
          "ecs": {
            "version": "8.0.0"
          },
          "process_hash": [
            "f4f684066175b77e0c3a000549d2922c",
            "935c1861df1f4018d698e8b65abfa02d7e9037d8f68ca3c2065b6ca165d44ad2"
          ],
          "device_timestamp": "2025-02-08T06:39:13.297Z",
          "process_name": """c:\windows\system32\cmd.exe""",
          "host": {
            "name": "hyperion"
          },
          "parent_pid": 2528,
          "attack_tactic": "TA0005",
          "event": {
            "dataset": "carbon_black.observations"
          },
          "process_username": [
            """NT AUTHORITY\SYSTEM"""
          ],
          "device_policy_id": 245339,
          "attack_technique": "T1027.010",
          "device_id": 30464438,
          "regmod_count": 0,
          "crossproc_count": 0,
          "scriptload_count": 0,
          "filemod_count": 0,
          "modload_count": 0,
          "backend_timestamp": "2025-02-08T06:40:24.603Z",
          "process_pid": [
            14416
          ],
          "observation_type": "INDICATOR_OF_ATTACK",
          "rule_id": "2DEED2A4-0115-4AF7-B6E2-FDCD30F5F7E5",
          "input": {
            "type": "filestream"
          },
          "@timestamp": "2025-02-09T16:53:30.964Z",
          "event_id": "AAF299E7-E5D6-11EF-84A5-30E1716D2521",
          "observation_id": "AAF299E7-E5D6-11EF-84A5-30E1716D2521:aaf299e6-e5d6-11ef-84a5-30e1716d2521",
          "org_id": "xxxxx",
          "data_stream": {
            "namespace": "default",
            "type": "logs",
            "dataset": "carbon_black.observations"
          }
        }
      }
    ]
  }
}

Assuming you have the above working this is how we will call your enrich pipeline

We will follow the new framework here It refers to elastic agent but works with the data streams we created.

Your Existing enrich pipeline

PUT _ingest/pipeline/mitre_attack_pipeline
{
  "description": "Pipeline to enrich MITRE ATT&CK fields",
  "processors": [
    {
      "set": {
        "field": "pipeline_name",
        "value": "mitre_attack_pipeline"
      }
    },
    {
      "enrich": {
        "policy_name": "mitre_tactic_policy",
        "field": "attack_tactic",
        "target_field": "attack_tactic_description",
        "ignore_missing": true
      }
    },
    {
      "enrich": {
        "policy_name": "mitre_technique_policy",
        "field": "attack_technique",
        "target_field": "attack_technique_description",
        "ignore_missing": true
      }
    }
  ]
}

and now this...
this pipeline will use the data stream framework to call your pipeline for the correct data streams. I tested it without your enrich policies because I do not have them but verified the above pipeline is called and added the pipeline_name to the document so I know it is running.

PUT _ingest/pipeline/logs@custom
{
  "processors": [
    {
      "pipeline": {
        "name": "mitre_attack_pipeline",
        "if": "ctx?.data_stream.dataset != null && (ctx?.data_stream.dataset == 'carbon_black.observations' || ctx?.data_stream.dataset == 'carbon_black.vulnerabilities')"
      }
    }
  ]
}

Get this working then we may need to validate your enrich processors etc.

Grr it didn't work with your new filebeat.yml

I haven't change anything about your last psot with pipeline

I did add your PUT _ingest/pipeline/logs@custom.
Lets see if that works..

:slight_smile: after i added that pipeline it works :slight_smile:
Now I must take some time to learn your config.

Can you please help me remove those two fields.

You need to be more precise. What "it" didn't work

....

Oh I just saw it. Does work awesome!...

Look at the configuration and ask me some questions if need be

About removing the other fields...

You can choose the field you want added from the enrich policy... So go back and look at that and only add the fields you want

enrich _fields

Or just add or remove processor and remove the fields you want?

Those are your choices

If you share your enrich rich policy, perhaps I can take a look later

Also share one of the documents that's in the enrich index

Just removed the fields :wink:
So nice that you have spend so much time for helping me , big thanks!

1 Like

Hi again Stephen,

I have some issue, I can't find vulnerability logs, logs-carbon_black.vulnerabilities-default.

I found in the filebeat log errors:

},"message":"Failed to publish event: failed to compute fingerprint: failed to find field [created_at] in event: key not found","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T11:09:46.073+0100","log.logger":"publisher","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).publish","file.name":"pipeline/client.go","file.line":99

Do you maybe know what could be wrong?

- type: filestream
  enabled: true
  paths:
    - /home/arc/vuln.json
  parsers:
    - ndjson:
        target: ""
        message_key: message 
        overwrite_keys: true
  processors:
    - fingerprint:
        fields: ["created_at", "cve_id"]
        target_field: "@metadata._id"
  fields_under_root: true      
  fields:
    data_stream.type: logs
    data_stream.dataset: carbon_black.vulnerabilities
    data_stream.namespace: default
    event.dataset: carbon_black.vulnerabilities       


# jq . vuln.json

{
  "os_product_id": "1161_1398769",
  "category": "APP",
  "os_info": {
    "os_type": "WINDOWS",
    "os_name": "Microsoft Windows 10 Enterprise",
    "os_version": "10.0.19045",
    "os_arch": "64-bit"
  },
  "product_info": {
    "vendor": "Mozilla",
    "product": "Mozilla Firefox ESR (x64 sv-SE)",
    "version": "115.6.0",
    "release": null,
    "arch": ""
  },
  "vuln_info": {
    "cve_id": "CVE-2024-7529",
    "cve_description": "The date picker could partially obscure security prompts. This could be used by a malicious site to trick a user into granting permissions. This vulnerability affects Firefox < 129, Firefox ESR < 115.14, Firefox ESR < 128.1, Thunderbird < 128.1, and Thunderbird < 115.14.",
    "risk_meter_score": 0.7,
    "severity": "LOW",
    "fixed_by": "128.1",
    "solution": null,
    "created_at": "2024-08-06T13:15:57Z",
    "nvd_link": "https://nvd.nist.gov/vuln/detail/CVE-2024-7529",
    "cvss_access_complexity": "LOW",
    "cvss_access_vector": "NETWORK",
    "cvss_authentication": "NONE",
    "cvss_availability_impact": "NONE",
    "cvss_confidentiality_impact": "NONE",
    "cvss_integrity_impact": "HIGH",
    "easily_exploitable": false,
    "malware_exploitable": false,
    "active_internet_breach": false,
    "cvss_exploit_subscore": 0.0,
    "cvss_impact_subscore": 0.0,
    "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:N",
    "cvss_v3_exploit_subscore": 2.8,
    "cvss_v3_impact_subscore": 3.6,
    "cvss_v3_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:N",
    "cvss_score": 0.0,
    "cvss_v3_score": 6.5
  },
  "device_count": 7,
  "affected_assets": [
    "xxxxx",
  ],
  "rule_id": null,
  "dismissed": false,
  "dismiss_reason": null,
  "notes": null,
  "dismissed_on": null,
  "dismissed_by": null,
  "deployment_type": null,
  "cve_id": "CVE-2024-7529"
}

I did solve it now with:

- type: filestream
  enabled: true
  paths:
    - /home/arc/vuln.json
  parsers:
    - ndjson:
        target: ""
        message_key: message
        overwrite_keys: true
  processors:
    - decode_json_fields:
        fields: ["message"]
        target: ""
        overwrite_keys: true
    - fingerprint:
        fields: ["created_at", "cve_id"]
        target_field: "@metadata._id"

But now I get
And fileds in Vuln get cve_id and vuln_info.cve_id fields.

Short questions regarding your config your made for me.

Obs logs goes to data stream -> logs-carbon_black.observation-default ->
.ds-logs-carbon_black.observations-default-2025.02.09-000001
shouldnt it be today date?

Probably not this is a data stream governed by an ILM policy it does not rollover everyday. So that is the date when it was first created.

Now you need to read about ILM Index Lifecycle Management

The default policy is to rollover the data when the shard size is 50GB or 30 Days Old. BTW Rollover every day looks / feels nice to beginners but it is not best practice and leads to issue as your data grows.

Read this

I suspect your are reading / using and old 3rd party "Guide" or something ... make sure you look at our docs... as Elastic evolves quickly and Guide fall out of date quickly

Please open a new topic for any additional questions

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.