Help ingesting Data

Hi all, I'm new to Elastic and Logstash. I have a source of event data which I'm having problems ingesting. I think it is because the data itself, but being new to Logstash it could also be me, so I'm not sure where the issue is. I've tried so many different things I've lost track.

I created a HTTP input on port 9999 and also turned off security due to certificate issues.

The source is sending each event in its own packet in a steam in the form of:

{"time": 1689541912.454, "event": {"nodeId": "nWXoUZ4CNTRL", "start": "2023-07-16T21:11:50.979186Z", "end": "2023-07-1621:11:53.929365 z", "virtualTraffic": [{"proto": 17, "src" : " [fd7a: 115c: ale0: ab12:4843: cd96:624:9181]:55993", "dst":" [fd7a: 115c: ale0::]:80", "txPkts" :1, "txBytes" :50}]3, "fields": {"recorded": "2023-07-1621:11:55.945733486Z"}}

But there is no comma between the events and it doesn't appear they are being put in an array either.

My output appears like this:

{"time":1689606809.419,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:27.94877Z","end":"2023-07-17T15:13:30.889983Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:62498","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:32.936285253Z"}}{"time":1689606814.419,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:32.950684Z","end":"2023-07-17T15:13:35.889057Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:56573","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:37.935786873Z"}}{"time":1689606819.414,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:37.938953Z","end":"2023-07-17T15:13:40.889584Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:59613","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:42.937654623Z"}}{"time":1689606824.418,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:42.947124Z","end":"2023-07-17T15:13:45.890088Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:63773","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:47.93908194Z"}}{"time":1689606829.421,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:47.95292Z","end":"2023-07-17T15:13:50.890138Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:64044","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:52.933831386Z"}}{"time":1689606834.413,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:52.938176Z","end":"2023-07-17T15:13:55.88964Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:53806","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:13:57.935319203Z"}}{"time":1689606839.428,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:13:57.967046Z","end":"2023-07-17T15:14:00.890223Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:51981","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:14:02.936707269Z"}}{"time":1689606844.418,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:14:02.944812Z","end":"2023-07-17T15:14:05.891353Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:57377","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:14:07.937676143Z"}}{"time":1689606849.423,"event":{"nodeId":"nWXoUZ4CNTRL","start":"2023-07-17T15:14:07.955742Z","end":"2023-07-17T15:14:10.890294Z","virtualTraffic":[{"proto":17,"src":"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:50102","dst":"[fd7a:115c:a1e0::]:80","txPkts":1,"txBytes":50}]},"fields":{"recorded":"2023-07-17T15:14:12.935710252Z"}}

I've tried many filters with splits and mutates.

I do have a support ticket open with the vendor sending the logs and I'm waiting on a response, but figured I'd ask here as well.

Any help you can provide me will be hugely appreciated. Let me know any info you may need and I'll happily provide it.

The best solution would be to fix this in the source so it sends one json per line or at least an array where each item is a json document.

But while you wait for a response you can use the following filters to fix your message:

filter {
    mutate {
        gsub => ["message", '}{"time', '}||{"time']
    }
    split {
        field => "message"
        terminator => "||"
    }
    json {
        source => "message"
    }
}

The gsub will replace every }{"time" in your message with }||{"time, the || is a string sequence that will be used as an event separator.

The split filter will split the field message using the || as a separator, it will create a new event for every item, so if you have this:

{a json event}||{another json event}

You will end up with 2 events.

And in the last part the json filter will parse your message.

In the end you will have an output like this:

{
      "@version" => "1",
          "time" => 1689606849.423,
    "@timestamp" => 2023-07-17T15:50:16.471551807Z,
         "event" => {
                "nodeId" => "nWXoUZ4CNTRL",
                   "end" => "2023-07-17T15:14:10.890294Z",
        "virtualTraffic" => [
            [0] {
                  "proto" => 17,
                    "dst" => "[fd7a:115c:a1e0::]:80",
                    "src" => "[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:50102",
                "txBytes" => 50,
                 "txPkts" => 1
            }
        ],
                 "start" => "2023-07-17T15:14:07.955742Z"
    },
          "host" => "lab",
       "message" => "{\"time\":1689606849.423,\"event\":{\"nodeId\":\"nWXoUZ4CNTRL\",\"start\":\"2023-07-17T15:14:07.955742Z\",\"end\":\"2023-07-17T15:14:10.890294Z\",\"virtualTraffic\":[{\"proto\":17,\"src\":\"[fd7a:115c:a1e0:ab12:4843:cd96:624a:9181]:50102\",\"dst\":\"[fd7a:115c:a1e0::]:80\",\"txPkts\":1,\"txBytes\":50}]},\"fields\":{\"recorded\":\"2023-07-17T15:14:12.935710252Z\"}}",
        "fields" => {
        "recorded" => "2023-07-17T15:14:12.935710252Z"
    }
}

You are fantastic thank you!

This has solved the problem (for now). I'll ask the Vendor to make the changes,

I had previously tried to do both the gsub and split, but independently, I didn't think to do them together!

I also have to use

additional_codecs =>{"application/json-seq"=>"json_lines" }

in the input to make the split work as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.