Troubleshooting logstash configs

Hi all.. I've been working on ELK for about a couple of months now and I've been really liking it apart from the troubelshooting steps. I've just integrated a syslog for trial before going heavily onto production phase. Enabled syslog forwarding from one of my ubuntu VMs to ELK machine. The logs are correctly showing up in ELk but appear with _grokparsefailure tags. (syslog source is 172.31.3.152 and destination ELK is 10.48.69.68)
I ran tcpdump on the ELK machine and got below logs-

21:08:02.580778 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [S], seq 3683519550, win 64240, options [mss 1350,sackOK,TS val 4280149539 ecr 0,nop,wscale 7], length 0
E..<1.@.8.......
0ED...M...>...............F...
...#........
21:08:02.580816 IP 10.48.69.68.1101 > 172.31.3.152.32986: Flags [S.], seq 516300327, ack 3683519551, win 28960, options [mss 1460,sackOK,TS val 390860530 ecr 4280149539,nop,wscale 7], length 0
E..<..@.@.;.
0ED.....M.....'...?..q .Y.........
.L.....#....
21:08:02.608012 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [.], ack 1, win 502, options [nop,nop,TS val 4280149566 ecr 390860530], length 0
E..41.@.8.......
0ED...M...?...(...........
...>.L..
21:08:02.608074 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [P.], seq 1:123, ack 1, win 502, options [nop,nop,TS val 4280149566 ecr 390860530], length 122
E...1.@.8..m....
0ED...M...?...(....7......
...>.L..<38>Aug 30 21:08:02 cd-ubuntu-jh sshd[109055]: Received disconnect from 172.20.219.16 port 64876:11: disconnected by user

21:08:02.608093 IP 10.48.69.68.1101 > 172.31.3.152.32986: Flags [.], ack 123, win 227, options [nop,nop,TS val 390860536 ecr 4280149566], length 0
E..4va@.@..7
0ED.....M.....(.........Q.....
.L.....>
21:08:02.635516 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [P.], seq 123:751, ack 1, win 502, options [nop,nop,TS val 4280149593 ecr 390860536], length 628
E...1.@.8..r....
0ED...M.......(....l......
...Y.L..<38>Aug 30 21:08:02 cd-ubuntu-jh sshd[109055]: Disconnected from user cyberuser 172.20.219.16 port 64876
<86>Aug 30 21:08:02 cd-ubuntu-jh sshd[109001]: pam_unix(sshd:session): session closed for user cyberuser
<30>Aug 30 21:08:02 cd-ubuntu-jh systemd[1]: session-1810.scope: Succeeded.
<38>Aug 30 21:08:02 cd-ubuntu-jh systemd-logind[836]: Session 1810 logged out. Waiting for processes to exit.
<38>Aug 30 21:08:02 cd-ubuntu-jh systemd-logind[836]: Removed session 1810.
<46>Aug 30 21:08:02 cd-ubuntu-jh rsyslogd: action 'action-7-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2001.0 try https://www.rsyslog.com/e/2359 ]

21:08:02.635542 IP 10.48.69.68.1101 > 172.31.3.152.32986: Flags [.], ack 751, win 237, options [nop,nop,TS val 390860543 ecr 4280149593], length 0
E..4vb@.@..6
0ED.....M.....(...-.....Q.....
.L.....Y
21:08:06.306483 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [P.], seq 751:922, ack 1, win 502, options [nop,nop,TS val 4280153264 ecr 390860543], length 171
E...1.@.8..:....
0ED...M...-...(.....%.....
.....L..<38>Aug 30 21:08:06 cd-ubuntu-jh sshd[109075]: Accepted publickey for cyberuser from 172.20.219.16 port 64996 ssh2: RSA SHA256:I0iwsS0G0BsGFK9swFHfwSDxmh9PCD9uBiLbVR/2UOE

21:08:06.306506 IP 10.48.69.68.1101 > 172.31.3.152.32986: Flags [.], ack 922, win 246, options [nop,nop,TS val 390861461 ecr 4280153264], length 0
E..4vc@.@..5
0ED.....M.....(.........Q.....
.L......
21:08:06.333437 IP 172.31.3.152.32986 > 10.48.69.68.1101: Flags [P.], seq 922:1213, ack 1, win 502, options [nop,nop,TS val 4280153291 ecr 390861461], length 291
E..W1.@.8.......
0ED...M.......(...........
.....L..<86>Aug 30 21:08:06 cd-ubuntu-jh sshd[109075]: pam_unix(sshd:session): session opened for user cyberuser by (uid=0)
<38>Aug 30 21:08:06 cd-ubuntu-jh systemd-logind[836]: New session 1811 of user cyberuser.
<30>Aug 30 21:08:06 cd-ubuntu-jh systemd[1]: Started Session 1811 of user cyberuser.

21:08:06.333467 IP 10.48.69.68.1101 > 172.31.3.152.32986: Flags [.], ack 1213, win 256, options [nop,nop,TS val 390861468 ecr 4280153291], length 0
E..4vd@.@..4
0ED.....M.....(.........Q.....
.L......

All I am doing is disconnecting from the ssh and logging in back with key-based-auth.
I am not really able to correlate with what is coming through network that I can see on tcpdump because there is alot of additional info coming in on network.
Now the question is how do I correlate and troubleshoot TCPDUMP logs with incoming logs to logstash.
I tried to google but no positive results.
The logfile related to logstash in /var/log doesn't show anything-

[2021-08-30T21:07:58,392][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2021-08-30T21:08:02,620][INFO ][logstash.inputs.syslog   ][main][1939d19f40efe997ee4384348ee2605a849d49dd4a440b7435c9c7888aec579a] new connection {:client=>"172.31.3.152:32986"}

my grok for syslog is below-

match => { "message" => "%{SYSLOGTIMESTAMP:time} %{SYSLOGHOST:hostname} %{DATA:program}\[%{POSINT:pid}\]: %{GREEDYDATA:message1}" }

I even tried the config provided in elastic webpage, but again it also gives _grokparsefailure.
Can someone provide the step by step working way of roubleshooting incoming logs (not restricted to only syslog).

Can you share an example of a message that you have in elasticsearch that has the _grokparsefailure tag and what your full pipeline config looks like?

From your tcpdump sample some of your messages will parse without any problem with your grok, but others will fail.

For example, the below messages should match your grok pattern.

<38>Aug 30 21:08:02 cd-ubuntu-jh sshd[109055]: Disconnected from user cyberuser 172.20.219.16 port 64876
<86>Aug 30 21:08:02 cd-ubuntu-jh sshd[109001]: pam_unix(sshd:session): session closed for user cyberuser
<30>Aug 30 21:08:02 cd-ubuntu-jh systemd[1]: session-1810.scope: Succeeded.
<38>Aug 30 21:08:02 cd-ubuntu-jh systemd-logind[836]: Session 1810 logged out. Waiting for processes to exit.
<38>Aug 30 21:08:02 cd-ubuntu-jh systemd-logind[836]: Removed session 1810.

But this one won't:

<46>Aug 30 21:08:02 cd-ubuntu-jh rsyslogd: action 'action-7-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2001.0 try https://www.rsyslog.com/e/2359 ]

To troubleshoot why you are getting a _grokparsefailure message you just need to see what is the error in your parse, no need to capture the traffic with tcpdump.

The last message for example has a different format, so you would need a different grok pattern.

Thanks @leandrojmp , below is a small screenshot from kibana that have grokparsefailure tag.


PS- I am not looking for solving current problem only, but looking for a practical solution on troubleshooting the logstash parsing. It maybe helpful to me and to many others as well. What if tomorrow I happen to face some challenges in parsing some other logs, how do I start to troubleshoot !!!
As per my timezone I can be again online in next 10 hours.

Do you have more than one grok filter in your pipeline? You didn't share your full pipeline.

The screenshot you shared has a parsed message as you have fields for logsource, pid, program and many others, so the original message was parsed somehow.

If you have more then one grok and your pipeline and you are not using conditionals, logstash will try to parse your message in every grok you have, even if it was already parsed.

Troubleshooting logstash is not much different than troubleshooting any other system, you need to know how the tool works to be able to find what is wrong.

For example, Logstash pipelines are sequential, every event will pass through every filter, it doesn't matter if you have a grok filter that already parsed your message, if you have another grok filter after that with a different pattern, logstash will try to parse the message again, and it will give an error because the pattern won't match, this is valid for every filter. To avoid things like this you can use conditionals to test if you should apply a filter to a message.

If a grok filter is not working, you need to test it step by step, test the first pattern with the first field of the message, if it works, go to the next, until you can build a working pattern that will parse your message, grok debugger in kibana helps a lot in this case.

And always check the documentation, the more you use Logstash and build pipelines, more easier will be to troubleshoot anything.

Hi @leandrojmp . I have only one line in grokpattern. My question is around finding a common way to troubleshoot with future integrations.
Is there a way to capture the exact data that is being received by logstash and then something on stdout{} ?
I remember from a past firewall integration, logstash-plain.log file was showing the data which was being received (due to a parsing error), using that I identified that grok pattern needed to be changed because syslog server was forwarding some data which was different than the documented one.
How can I see the exact incoming data and data being parsed ? For watching the incoming data I tried using tcpdump this time because logstash-plain.log log wasn't showing anything.

PS- I am using config files in conf.d directory. So not sure how stdout {} can help here.

You can set the log level to DEBUG and logstash will log what it is receiving and what it is doing, but it is a lot of logs, I would recommend setting it only temporarily.

The stdout output will send the output messages to the standard output, which is the console when running via command line or the system log file when running as a service, probably /var/log/messages.

You could also add a file output in your config using the plain codec, this will output the plain message that logstash received, consdering that you didn't removed the filter message or changed it.

As I said, you are receiving messages that do not match the grok pattern you provided, so it will give you _grokparsefailure errors, also, the screenshot that you shared has a parsed message, so something in your config parsed that message and a grok filter tried to parse it again, impossible to know without seeing the pipeline configuration.

As I said, logstash will process your events in a sequential form, one after another passing through every filter, to find what is not working in your pipeline you could save a sample of your messages in a pipeline and try to simulate it locally, adding the filters one by one to see what is wrong.

Also, if you have multiple files in conf.d that represents different pipelines, depending on how you configured pipelines.yml or how you are running logstash, it could merge all the files into just one big pipeline.

As you didn't provided more information about how you are running your pipeline and how your config looks, it is not possible to point to you how you should tackle the issue.

Thanks @leandrojmp below is my config file, plz have a look,

input {
	syslog {
		port => 1101
		tags => ["syslog1101"]
	}
}
filter {
	if "syslog1101" in [tags] {
		grok {
		      #match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
		      match => { "message" => "%{SYSLOGTIMESTAMP:time} %{SYSLOGHOST:hostname} %{DATA:program}\[%{POSINT:pid}\]: %{GREEDYDATA:message1}" }
			add_field => ["received_at", "%{@timestamp}"]
			### add_field => ["received_from", "%{host}"]
		}
		date {
			match => ["syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss"]
		}
	}
}
output {
	elasticsearch {
		hosts => ["localhost:9200"]
		index => "syslog1101"
		user => "elastic"
		password => "somePassword"
	}
       #stdout { } 
}

The commented grok pattern was from elastic.co official document(that too gave grokparsefailures), changed it to new one.
Update- I uncommented stdout in configfile, shutdown logstash and ran logstash with parameters now - /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/syslog.conf
The logline that I found in auth.log is-

Sep  1 15:13:29 cd-ubuntu-jh sudo: pam_unix(sudo:session): session opened for user root by cyberuser(uid=0)

Below is the syslog messages on stdout {}

[INFO ] 2021-09-01 15:13:30.041 [Ruby-0-Thread-51: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-syslog-3.5.0/lib/logstash/inputs/syslog.rb:172] syslog - new connection {:client=>"172.31.3.152:34636"}
{
           "message" => "pam_unix(sudo:session): session opened for user root by cyberuser(uid=0)\n",
          "priority" => 86,
          "severity" => 6,
              "host" => "172.31.3.152",
         "logsource" => "cd-ubuntu-jh",
    "facility_label" => "security/authorization",
         "timestamp" => "Sep  1 15:13:29",
              "tags" => [
        [0] "syslog1101",
        [1] "_grokparsefailure"
    ],
          "facility" => 10,
        "@timestamp" => 2021-09-01T09:43:29.000Z,
           "program" => "sudo",
    "severity_label" => "Informational",
          "@version" => "1"
}

Also with the log.level debug and starting logstash I guess I got the problem with grok failures, the log had different logline reaching the ELK ( look at <85> coming with log before timestamp )

[2021-09-01T15:18:25,094][DEBUG][logstash.filters.grok    ][main][e74d8a84ae5f02e65715fe43c3578b58b1d987b0067f594b878474fc12ede372] Running grok filter {:event=>{"message"=>"<85>Sep  1 15:18:23 cd-ubuntu-jh sudo: cyberuser : TTY=pts/0 ; PWD=/home/cyberuser ; USER=root ; COMMAND=/bin/bash\n", "@version"=>"1", "tags"=>["syslog1101"], "@timestamp"=>2021-09-01T09:48:25.075Z, "host"=>"172.31.3.152"}}
[2021-09-01T15:18:25,116][DEBUG][logstash.filters.grok    ][main][e74d8a84ae5f02e65715fe43c3578b58b1d987b0067f594b878474fc12ede372] Event now:  {:event=>{"message"=>"cyberuser : TTY=pts/0 ; PWD=/home/cyberuser ; USER=root ; COMMAND=/bin/bash\n", "@version"=>"1", "tags"=>["syslog1101"], "program"=>"sudo", "timestamp"=>"Sep  1 15:18:23", "logsource"=>"cd-ubuntu-jh", "host"=>"172.31.3.152", "@timestamp"=>2021-09-01T09:48:25.075Z, "priority"=>"85"}}

Then I went ahead and ran tcpdump and debug together on ELK

**tcpdump -nnAs0 port 1101**
15:56:24.313353 IP 172.31.3.152.34640 > 10.48.69.68.1101: Flags [P.], seq 1841660575:1841660795, ack 1621476397, win 502, options [nop,nop,TS val 139283976 ecr 429382461], length 220
E.....@.8._.....
0ED.P.Mm...`..-...........
.MN....=<85>Sep  1 15:56:24 cd-ubuntu-jh sudo: cyberuser : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash
<86>Sep  1 15:56:24 cd-ubuntu-jh sudo: pam_unix(sudo:session): session opened for user root by cyberuser(uid=0)

You can see .MN....=<85> before the syslog timestamp, probably causing grok failures.
Log from logstash-plain.log with debug enabled-

[2021-09-01T15:56:24,316][DEBUG][logstash.filters.grok    ][main][e74d8a84ae5f02e65715fe43c3578b58b1d987b0067f594b878474fc12ede372] Running grok filter {:event=>{"message"=>"<85>Sep  1 15:56:24 cd-ubuntu-jh sudo: cyberuser : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash\n", "@version"=>"1", "tags"=>["syslog1101"], "@timestamp"=>2021-09-01T10:26:24.315Z, "host"=>"172.31.3.152"}}
[2021-09-01T15:56:24,317][DEBUG][logstash.filters.grok    ][main][e74d8a84ae5f02e65715fe43c3578b58b1d987b0067f594b878474fc12ede372] Event now:  {:event=>{"message"=>"cyberuser : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash\n", "@version"=>"1", "tags"=>["syslog1101"], "program"=>"sudo", "timestamp"=>"Sep  1 15:56:24", "logsource"=>"cd-ubuntu-jh", "host"=>"172.31.3.152", "@timestamp"=>2021-09-01T10:26:24.315Z, "priority"=>"85"}}

However <85> not included in grokpattern was still captured as priority-

Sometimes there are logs starting right with timestamp-

[2021-09-01T16:14:54,918][DEBUG][logstash.filters.grok    ][main][a3e05bf8207576dab9eff27a9b9c1a31d5bca62bd4d948cf322f3431e3e2b556] Running grok filter {:event=>{"message"=>"Started OpenBSD Secure Shell server.\n", "facility_label"=>"system", "priority"=>30, "@version"=>"1", "program"=>"systemd", "@timestamp"=>2021-09-01T10:44:54.000Z, "severity_label"=>"Informational", "timestamp"=>"Sep  1 16:14:54", "host"=>"172.31.3.152", "facility"=>3, "tags"=>["syslog1101"], "pid"=>"1", "logsource"=>"cd-ubuntu-jh", "severity"=>6}}

Now how to deal with such variations in incoming logs !

Ok, now it makes sense.

You are using the syslog input, this input will parse part of your message as it also runs a grok with the following configuration.

So, your message will be first parsed with the following grok pattern:

<%{POSINT:priority}>%{SYSLOGLINE}

The SYSLOGLINE pattern corresponds to the following pattern:

%{SYSLOGBASE2} %{GREEDYDATA:message}

And the SYSLOGBASE2 is:

(?:%{SYSLOGTIMESTAMP:timestamp}|%{TIMESTAMP_ISO8601:timestamp8601}) (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource}+(?: %{SYSLOGPROG}:|)

So, your syslog input will parse your message an give you the fields:

  • priority, timestamp, facility, logsource, program, pid and message

You will also get the severity, severity_label and facility_label from this input and the message field will be overwritten from the original message that was sent by your syslog.

If your syslog is sending this message:

<85>Sep  1 15:56:24 cd-ubuntu-jh sudo: cyberuser : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash

After the parses done by the syslog input, you will be left with this value in the message field:

cyberuser : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/bin/bash

This is the message that you should try to parse, not the original message sent by your syslog, your original grok pattern will fail because none of the message that pass through the grok filter starts with a timestamp, they were already parsed before and the message field was overwritten.

You have basically two options to solve this, the first is to build a new grok pattern to try to parse this messages, if you have different types of messages, you will need a pattern from each one.

The second option is to not use the syslog input, use the tcp or udp input and build a grok based on the original message sent by the syslog.

Thank you @leandrojmp . This all makes sense now.
Also, based on your suggested debugging methods, I succeeded in integrating one firewall on syslog today :blush:
Again thank you so much...