Keep getting _jsonparsefailures

I am still relatively new to logstash, so I have been trying to run through a bit of a crash course in using it and getting logs to filter correctly. I finally have logs hitting like they should, but I cannot for the life of me figure out how to fix this error, which is in my sampleFile. My conf looks like this:

input {
  tcp {
    port => 9200
    codec => plain
  }
}

filter {
  mutate {
    gsub => [
      "message", "^.* - ", ""
    ]
  }
  json {
    source => "message"
  }
}

output {
  microsoft-sentinel-log-analytics-logstash-output-plugin {
    create_sample_file => true
    sample_file_path => "/tmp/logstash/phisher"
  }
}

And, here is what one of the logs comes through as on the sample file:

  {
    "tags": [
      "_jsonparsefailure"
    ],
    "message": "604 <118>1 - phisher.knowbe4.com PhishER - - - {\"receivedAt\":\"2024-09-23T11:39:26.772199Z\",\"reportedAt\":\"2024-09-23T11:39:17Z\",\"sender\":\"redacted@redacted.com\",\"reporter\":\"redacted@redacted.com\",\"subject\":\"You're Order Has Shipped!  - Order Number\",\"priority\":\"critical\",\"category\":\"threat\",\"status\":\"resolved\",\"tags\":[\"LA_CLOSE_TICKET_SUCCESS\",\"MANUAL_RESOLVED\",\"LA_OPEN_TICKET_SUCCESS\",\"MANUAL\",\"VT_SCANNED\",\"KB4:SHIPPING\",\"KB4:SPF_PASS\",\"KB4:DKIM_PASS\",\"KB4:BILLING\",\"USER:THREAT\"],\"action\":\"Syslog\",\"permalink\":\"https://phisher.knowbe4.com/inbox/3redacted75\"}",
    "ls_timestamp": "2024-09-24T02:59:48.370952470Z",
    "ls_version": "1"
  },

I have a hunch that this section is breaking things:
604 <118>1 - phisher.knowbe4.com PhishER - - -
due to the <118>, so I tried to get the message to split this out, and I have tried some other filtering.

Really, I just want the data that starts at "receivedAt" to be the only thing coming through in the message, so fields of receivedAt, reportedAt, sender, reporter, subject, priority, category, status, tags, action, permalink - all parsed as normal JSON, instead of one long line of data.

It's hard to believe that you are actually running that configuration, since it would modify the [message] field and remove everything up to " Order Number" (there is a - embedded in the subject field of the JSON).

You could try

 mutate { gsub => [ "message", "^.*{", "{" ] }

Sorry for the late response, but I finally got the time to get back to this after things popped up. I just swapped my config over with your alteration:

input {
  tcp {
    port => 9200
    codec => json
  }
}
filter {
	mutate { gsub => [ "message", "^.*{", "{" ] }
  json {
    source => "message"
  }
}
output {
    microsoft-sentinel-log-analytics-logstash-output-plugin {
      create_sample_file => true
      sample_file_path => "/tmp/logstash/phisher"
    }
}

Now it isn't creating the sample file like before, but it is throwing new errors in the log regarding parse issues.

[2024-10-11T03:24:10,124][WARN ][logstash.codecs.jsonlines][main][331942d974762e9cc71682554982ae4f3ce5b8a8d1aee0eabb183f627feac79e] JSON parse error, original data now in message field {:message=>"incompatible json object type=java.lang.Integer , only hash map or arrays are supported", :exception=>LogStash::Json::ParserError, :data=>"634 <118>1 - phisher.knowbe4.com PhishER - - - {\"receivedAt\":\"2024-10-10T14:46:13.795281Z\",\"reportedAt\":\"2024-10-10T14:46:02Z\",\"sender\":\"noreply@qrver.com\",\"reporter\":\"joe_@filtered.com\",\"subject\":\"Reminder: How can UPS serve your business better?\",\"priority\":\"low\",\"category\":\"spam\",\"status\":\"resolved\",\"tags\":[\"LA_CLOSE_TICKET_SUCCESS\",\"MANUAL_RESOLVED\",\"VT_SCANNED\",\"LA_OPEN_TICKET_SUCCESS\",\"MANUAL\",\"VT_BYPASSED\",\"KB4:SHIPPING\",\"KB4:SPF_PASS\",\"KB4:COMMUNICATION\",\"KB4:DKIM_PASS\",\"PML:SPAM\",\"USER:THREAT\",\"TI_SCANNED\"],\"action\":\"Syslog\",\"permalink\":\"https://phisher.knowbe4.com/inbox/eff5f74c-87556dcb056aa\"}"}

But you are right. Really, I am hoping to just have it output the "message" contents, with it being split up into a clean json array of something like

{
"receivedAt":"2024-10-10T14:46:13.795281Z",
"reportedAt":"2024-10-10T14:46:02Z",
"sender":"noreply@ redacted.com",
"reporter":"joe_@redacted.com",
"subject":"Reminder: How can UPS serve your business better?",
"priority":"low",
"category":"spam",
"status":"resolved",
"tags":[
	"LA_CLOSE_TICKET_SUCCESS",
	"MANUAL_RESOLVED",
	"VT_SCANNED",
	"LA_OPEN_TICKET_SUCCESS",
	"MANUAL","VT_BYPASSED",
	"KB4:SHIPPING",
	"KB4:SPF_PASS",
	"KB4:COMMUNICATION",
	"KB4:DKIM_PASS",
	"PML:SPAM",
	"USER:THREAT",
	"TI_SCANNED"
	],
"action":"Syslog",
"permalink":"https://phisher.knowbe4.com/inbox/eff5f74c-87556dcb056aa
}

Remove the codec from the input. It's not going to be able to parse the message until after the mutate, so you need a filter, not a codec.