Logstash output plugin data changes when stored to file as compared to standard output stdout{ }

Hello All,

I am new to logstash and working on a task to get syslogs from GitHub to Logstash to Elastic Search. I am facing a strange problem in Logstash.

To understand the format that is coming on to input port of the logstash. I used multiline codec in Input configuration, no filter and for output I tried 2 options.

Option 1 - Write events to standard Output. stdOut{ }
In this case, I redirected output through JournalCtl to physical file. Below is some of the samples received.

Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: {
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "host" => "ec2-*-*-*-*.us-east-2.compute.amazonaws.com",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "port" => 23239,
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "type" => "syslog",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "@version" => "1",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "message" => "<14>Nov  3 13:05:02 github-coll-com-primary github_production: [Producer ] Sending 1 messages to localhost:9092 (node_id=1)",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "@timestamp" => 2022-11-04T09:22:53.829Z
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: }
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: {
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "host" => "ec2-x-x-x-x.us-east-2.compute.amazonaws.com",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "port" => 23239,
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "type" => "syslog",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "@version" => "1",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "message" => "<142>Nov  3 13:05:03 github-coll-com-primary haproxy[28834]: x.x.x.x:27151 [03/Nov/2022:13:04:03.977] https_protocol~ alive/localhost 0/0/1/0/59887 101 468 - - ---- 35/35/29/29/0 0/0 {github.coll.com||Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb|TLSv1.3} \"GET /_sockets/u/310/ws /_sockets/u/310/ws?session=eyJ2IjoiVjMiLCJ1IjozMTAsInMiOjQxNDUsImMiOjIyMjY5MDEyNjYsInQiOjE2Njc0NTE5MDl9--10524eade56c197f1084d9eb6afedd9c4fe41de604b967a5a4f5fe9d0db3c8c7f69&shared=true&p=2115294313_1667285866.1129\"",
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: "@timestamp" => 2022-11-04T09:22:53.837Z
Nov 04 09:22:54 logstashpublic.coll.com logstash[53518]: }type or paste code here

It seems I am receiving data in multiple lines and first character is "{".

Option 2 - Write output to a file through Logstash Output configuration
In this case, below is the output written to the file.

<14>Nov  4 10:19:03 github-coll-com-replica github-timerd[7269]: app=github env=production enterprise=true ns=timer_daemon now="2022-11-04T10:19:03Z" level=INFO msg="[ActiveJob] Enqueued RetryJobsJob (Job ID: d17f5194-0e9f-4421-a628-80954e79ee69) to Aqueduct(retry_jobs)"
<14>Nov  4 10:19:04 github-coll-com-replica github-timerd[7269]: app=github env=production enterprise=true ns=timer_daemon now="2022-11-04T10:19:04+00:00" at=run timer=RetryJobsJob

If you compare both outputs, you will see option 1 output has { in the starting followed by fields "host", "port", "type", "version", "message", "timestamp" and finally }.
But Option 2 has only contents of message field.

In both options, my logstash config file had same input section, same filter section and output section changed. Below is the configuration used in Output section for each option.

Option 1 -

output {
        stdout{}
}

For Option 2:-

output {       
        file {
                path => "/var/log/tls_logs/github1.log"
                codec => line { format => "%{message}"}
        }
}

Due to this difference in output, I am not able to write anything in filter and using json codec in output produces parse error as below

[2022-11-04T12:25:18,576][WARN ][logstash.codecs.jsonlines][syslog_github][39b00e8e346a1a79c1759061ed3b3cb9688515d8831513ef5f5a3ccfdf34d4e7] JSON parse error, original data now in message field {:message=>"Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: (String)\"<30>Nov  4 12:25:18 github-coll-com-replica systemd[1]: Started Nomad jobs service.\"; line: 1, column: 2]"

Request help here.

They are different because you are telling logstash to output them this way.

You are using format => "%{message}" in your file output, this tell logstash to output only the message field.

Just remove the entire codec line from your file output and they will be the same.

        file {
                path => "/var/log/tls_logs/github1.log"
        }

Thank you so much, Leandro for pointing out error.

I am still struggling to find why Kibana shows " tags: _jsonparsefailure" in the message.

This issue is observed when logs are read from the file and push to Kafka plugin configured in the output section of logstash.

In short the event is flowing via 2 pipelines.
First pipeline reads input events from incoming TCP port and writes to file. Your suggestion solved the issue at this point.
Second pipeline reads input from the file and outputs to Kafka plugin which forwards it to Elastic Search. When I check logs into Kibana, I get "tags: _jsonparsefailure" in the messages.

I tried to write event in file if tags contain _jsonparsefailure but nothing is written to the file. Below is the change in Output section.

output {
	if "_jsonparsefailure" in [tags] {
                file {
                        path => "/var/log/logstash/_jsonparsefailure.txt"
                }
    }

  kafka {
    bootstrap_servers => "aws.us-west-2.elb.amazonaws.com:9092"
    #codec => "plain"
    #codec => json
    #topic_id => "%{source_name}"
    topic_id => "github"
  }

}

Please suggest how to debug event which shows error in Kibana.

Without you showing your full pipeline is impossible to know what is the issue.