Making the best of an existing Syslog stream

My task is to introduce log analysis to an environment where I am a drop-in replacement for Syslog and I do not have the option of asking clients to use a different shipper.

In this environment logs are forwarded to my machine via Syslog from multiple different systems running a variety of applications. I listen to the Syslog stream with Logstash, and send to ElasticSearch.

I need to match each line to any one of a number of log format patterns, and once matched, parse out the appropriate fields and set meta identifying type.

I also need to recognize and deal with different types of multiline messages, appending only the syslog-message portion to the previous syslog-message and not the entire line including syslog preamble. This appears to be very tricky if possible at all, possibly requiring a combination of multiline, match and mutate. My search for information on this has been complicated by the evolution of the multiline filter to a codec, making it hard to tell whether some advice is currently relevant.

As this use case seems to me to be a nut that a few people have likely cracked, I'm looking for pointers and links that will set me on the right path.

I would start with finding out whether you can use a syslog input. Syslog is many things to many people. A syslog input expects RFC 3164 messages. Otherwise you would might use a TCP input. (Or there was a recent post suggesting using rsyslog to talk to all those syslog daemons and forward to logstash.)

If you need to selectively apply a multiline codec you may want to have multiple pipelines. Have your main pipeline figure out what multiline treatment based on host or a pattern match or whatever then use tcp output/input pairs to apply the multiline codec. If you cannot get multiline it might be possible to use aggregate, but any pipelines running aggregate are restricted to a single pipeline worker thread (also, you want pipeline.java_execution set to false in logstash.yml until this bug is fixed).

In a past experience taking on several new data feeds at the same time I found it helpful to tag data once I thought it was being parsed correctly and feed that to a different index. All the stuff that had not been parsed properly was fed into a 'fixme' index.

Excellent - using rsyslog with the json output to logstash solves a couple of issues:

  • listening on privileged port 514
  • parsing RFC Syslog fields without wheel reinvention

I'll get some practice on multiple pipelines.

Thanks for the tip re the fixme index - a good practice.

I wonder if anyone can recommend a development workflow. I find myself with multiple sessions open:

  • one to edit the logstash config, then HUP the process to reload
  • another to use logger -f example-data.log to send the data
  • a kibana browser window to see the results

I suspect I should skip the Kibana step by outputting directly the console via ruby-debug output, but maybe there are existing tools and scripts that make the Logstash SDLC a bit less wonky?

  • Brent -

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.