Making the best of an existing Syslog stream

Brent_Ashley · August 18, 2019, 9:27pm

My task is to introduce log analysis to an environment where I am a drop-in replacement for Syslog and I do not have the option of asking clients to use a different shipper.

In this environment logs are forwarded to my machine via Syslog from multiple different systems running a variety of applications. I listen to the Syslog stream with Logstash, and send to ElasticSearch.

I need to match each line to any one of a number of log format patterns, and once matched, parse out the appropriate fields and set meta identifying type.

I also need to recognize and deal with different types of multiline messages, appending only the syslog-message portion to the previous syslog-message and not the entire line including syslog preamble. This appears to be very tricky if possible at all, possibly requiring a combination of multiline, match and mutate. My search for information on this has been complicated by the evolution of the multiline filter to a codec, making it hard to tell whether some advice is currently relevant.

As this use case seems to me to be a nut that a few people have likely cracked, I'm looking for pointers and links that will set me on the right path.

Badger · August 18, 2019, 9:51pm

I would start with finding out whether you can use a syslog input. Syslog is many things to many people. A syslog input expects RFC 3164 messages. Otherwise you would might use a TCP input. (Or there was a recent post suggesting using rsyslog to talk to all those syslog daemons and forward to logstash.)

If you need to selectively apply a multiline codec you may want to have multiple pipelines. Have your main pipeline figure out what multiline treatment based on host or a pattern match or whatever then use tcp output/input pairs to apply the multiline codec. If you cannot get multiline it might be possible to use aggregate, but any pipelines running aggregate are restricted to a single pipeline worker thread (also, you want pipeline.java_execution set to false in logstash.yml until this bug is fixed).

In a past experience taking on several new data feeds at the same time I found it helpful to tag data once I thought it was being parsed correctly and feed that to a different index. All the stuff that had not been parsed properly was fed into a 'fixme' index.

Brent_Ashley · August 18, 2019, 10:24pm

Excellent - using rsyslog with the json output to logstash solves a couple of issues:

listening on privileged port 514
parsing RFC Syslog fields without wheel reinvention

I'll get some practice on multiple pipelines.

Thanks for the tip re the fixme index - a good practice.

I wonder if anyone can recommend a development workflow. I find myself with multiple sessions open:

one to edit the logstash config, then HUP the process to reload
another to use logger -f example-data.log to send the data
a kibana browser window to see the results

I suspect I should skip the Kibana step by outputting directly the console via ruby-debug output, but maybe there are existing tools and scripts that make the Logstash SDLC a bit less wonky?

Brent -

system · September 15, 2019, 10:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Input Filter: Syslog vs TCP/UDP Logstash	3	1285	July 10, 2018
Design options for ingesting syslog data Logstash	5	1333	May 18, 2020
Multiline codec is not breaking messages correctly Logstash	1	409	January 28, 2020
Multiple patterns in multiline codec Logstash	1	881	July 6, 2017
Logstash vs rsyslog java stack multiline issue Logstash	13	1392	September 18, 2017

Making the best of an existing Syslog stream

Related topics