Data Format Problem in Elastic: Direct Beats vs Logstash

Hi everyone,

I have currently a problem to ingest my logs into elasticsearch the correct way, because the logs look different from the beats=>logstash=>elastic input than the beats=>elastic input (I set up two elasticsearch instances to test this).

My beats=>elastic raw JSON logs look like this:

{
  "_index": "winlogbeat-7.11.1-2021.04.29-000003",
  "_type": "_doc",
  "_id": "CzfmqXkBuIv63rU8zefm",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2021-05-26T18:19:03.479Z",
    "agent": {
      "version": "7.11.1",
      "hostname": "DESKTOP-G8KP9KF",
      "ephemeral_id": "b42deb6b-f405-4db5-b167-82ce7aa928ff",
      "id": "4a7919ac-bc05-4eb8-b634-6a58c7fafeeb",
      "name": "DESKTOP-G8KP9KF",
      "type": "winlogbeat"
    },
    "message": "Process Create:\nRuleName: -\nUtcTime: 2021-05-26 18:19:03.479\nProcessGuid: {2c8feff4-9117-60ae-9602-000000000700}\nProcessId: 4628\nImage: C:\\Program Files\\WindowsApps\\Microsoft.MicrosoftOfficeHub_18.2104.12721.0_x64__8wekyb3d8bbwe\\LocalBridge.exe\nFileVersion: 18.2104.1272.0\nDescription: LocalBridge\nProduct: LocalBridge\nCompany: -\nOriginalFileName: LocalBridge.exe\nCommandLine: \"C:\\Program Files\\WindowsApps\\Microsoft.MicrosoftOfficeHub_18.2104.12721.0_x64__8wekyb3d8bbwe\\LocalBridge.exe\" /InvokerPRAID: Microsoft.MicrosoftOfficeHub notifications\nCurrentDirectory: C:\\WINDOWS\\system32\\\nUser: DESKTOP-G8KP9KF\\mvm\nLogonGuid: {2c8feff4-8634-60ae-5628-020000000000}\nLogonId: 0x22856\nTerminalSessionId: 1\nIntegrityLevel: Medium\nHashes: MD5=9774AC9F3B1C9B7CEB3A28568EFF0720,SHA256=4D18D9ACA0208F2C244E751AD6B3D8472308616A1D0CAA38988DA78AAD9C11AE,IMPHASH=00000000000000000000000000000000\nParentProcessGuid: {2c8feff4-9117-60ae-9502-000000000700}\nParentProcessId: 1604\nParentImage: C:\\Windows\\System32\\RuntimeBroker.exe\nParentCommandLine: C:\\Windows\\System32\\RuntimeBroker.exe -Embedding",
    "host": {
      "ip": [
        "10.10.10.201",
        "fe80::1948:f664:3b8c:f8ec",
        "192.168.225.162",
        "fe80::e9e0:19b8:b633:eba",
        "10.9.9.250",
        "2a02:8070:4181:30f0::c94",
        "2a02:8070:4181:30f0:2cbe:4d3c:b523:322e",
        "fd75:16d0:de9e::c94",
        "fd75:16d0:de9e:0:2cbe:4d3c:b523:322e",
        "2a02:8070:4181:30f0:31d0:2f77:1d20:8049",
        "fd75:16d0:de9e:0:31d0:2f77:1d20:8049",
        "fe80::2cbe:4d3c:b523:322e",
        "10.9.8.163"
      ],
      "mac": [
        "00:0c:29:63:cd:f2",
        "00:0c:29:63:cd:fc",
        "00:0c:29:63:cd:06"
      ],
      "hostname": "DESKTOP-G8KP9KF",
      "architecture": "x86_64",
      "name": "DESKTOP-G8KP9KF",
...

And the beats=>logstash=>elastic logs look like this:

{
  "_index": "winlogbeat-7.13.3-2021.07.17-000001",
  "_type": "_doc",
  "_id": "9i6OtHoBsAtFNcPbpqtL",
  "_version": 1,
  "_score": null,
  "fields": {
    "process.hash.md5": [
      "bdb0b06d6a88e47add4076e2adec2e93"
    ],
    "winlog.event_data.LogonId": [
      "0x3e7"
    ],
    "event.category": [
      "process"
    ],
    "process.name.text": [
      "mighost.exe"
    ],
    "host.os.name.text": [
      "Windows 10 Pro"
    ],
    "winlog.provider_name": [
      "Microsoft-Windows-Sysmon"
    ],
    "winlog.provider_guid": [
      "{5770385f-c22a-43e0-bf4c-06f5698ffbd9}"
    ],
    "process.parent.command_line": [
      "\"C:\\$WINDOWS.~BT\\Sources\\SetupHost.Exe\" /Install /Package /Quiet  /ReportId FF95561B-934D-4F58-AF55-CABBDFFB1A92.1 /FlightData \"RS:ADF2\" \"/CancelId\" \"C-29736b89-8658-4a85-aeaf-39c2b12bc06f\" \"/PauseId\" \"P-29736b89-8658-4a85-aeaf-39c2b12bc06f\" \"/CorrelationVector\" \"jU/UeOMLSUuuQff/.1.5.17.1.3.151\" \"/EnterpriseAttribution\" \"/ActionListFile\" \"C:\\Windows\\SoftwareDistribution\\Download\\9439f90370086bc5c43cd52ea62a43e9\\ActionList.xml\" "
    ],
    "process.parent.name": [
      "SetupHost.exe"
    ],
    "process.parent.pid": [
      7660
    ],
    "process.hash.sha256": [
      "1f3f5041b1f5dbe8fbd61f50d7f1cd44f1d98578cf037b86fcc6313458ccbb20"
    ],
    "related.hash": [
      "bdb0b06d6a88e47add4076e2adec2e93",
      "1f3f5041b1f5dbe8fbd61f50d7f1cd44f1d98578cf037b86fcc6313458ccbb20",
      "1a2e6fbe71caa18e49b7aebbc2eac135"
    ],
    "host.hostname": [
      "CLIENT-10-01"
    ],
    "process.pid": [
      2228
    ],
    "winlog.computer_name": [
      "CLIENT-10-01.sec.lab.local"
    ],
    "host.mac": [
      "00:0c:29:07:cc:c7",
      "00:0c:29:07:cc:d1"
    ],
    "winlog.process.pid": [
      9560
    ],
    "process.parent.entity_id": [
      "{2c8feff4-d155-60f2-638f-0e0200000000}"
    ],
...

So you can see that if I have logstash in between, every single field in is a "list" ("FIELD":["string"]) or a "multi field" (stated as the label in the parsed log view on each field). And this causes a problem for the detections from the SIEM part of elastic, because the detections seem to expect strings as a data type instead of a list of strings.

I also tried to load the index templated manually like this

.\winlogbeat.exe setup --index-management -E 'output.elasticsearch.username="elastic"' -E 'output.elasticsearch.password="CHANGEME"' -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["10.9.9.91:9200"]' -E "setup.ilm.overwrite=true"

But this did not fix my problem..

My setup looks like this:

winlogbeat => logstash => elastic

I set up winlogbeat to use the output to logstash and my logstash config looks like the following:

input {

  beats {
                port => 5044
  }

}

output {

  elasticsearch {
                hosts => ["http://10.9.9.91:9200"]
                index => "%{[@metadata][beat]}-%{[@metadata][version]}"

                user => "logstash_internal"
                password => CHANGEME
  }

}

I also checked the output of Logstash with the ruby debug stdout and here the data format was like expected without the lists that are wrapping the string-data.

I am also a bit new to elastic stack. So can anyone help?

Thanks in advance.