Many input ports to templated destinations

Will_Weber · August 25, 2018, 1:58pm

Hey all!

Looking primarily for a recommendation or two on how one would recommend structuring a logstash config that is supposed to receive nearly 300 potentially very diverse sets of logs over the network that don't necessarily conform to a given message structure.

IE, non-rfc-compliant syslog, compliant syslog, cef from others, json from others.

The primary goal however, is to identify those messages as they come in, and then route them to their own topic on kafka. Thankfully due to the templating system in logstash that is exceedingly simple.

To better construct my thought, here's a sample config:

input {
  tcp {
    port => 9000
    add_field => {
      topic => "syslog00"
    }
  }
  tcp {
    port => 9001
    add_field => {
      topic => "syslog01"
    }
  }
  tcp {
    port => 9002
    add_field => {
      topic => "syslog02"
    }
  }
#...(imagine another 297 input's similar to this)
}

output {
  kafka {
    bootstrap-servers => "kfk01.lan:9092,kfk02.lan:9092"
    topic_id => "%{[topic]}"
    compression => snappy
  }
}

I have a distinct feeling that there is a simpler way to do this, though I'm unable to really wrap my head around one that remains as easy to understand. Very open to suggestions as how to maybe make this a little better at scaling. (Also, not sure how well logstash would support this many open sockets at one time, though udp may be able to help with this, maybe?).

Unfortunately, there isn't really an easy way to identify the content of a message being passed in due to the lack of consistency in their structure(unstructured syslog, rfc-syslog, cef, json, etc.).

At first, I thought that using a translate filter and collection of hostnames or ip's would solve this, but due to the variation in hosts that would be sending in to this machine and the elastic nature of resources on the network, I figured that that path would be difficult to maintain long term.

Very open to suggestions or recommendations! Thanks for reading!

magnusbaeck · August 27, 2018, 6:08am

Well, if it isn't practical to detect the message kind based on the payload I don't see another option besides different listening ports as exemplified above. I'd prefer autodetection, at least as the primary means of message classification. You might not get down to a single-digit port count but surely way less than 300.

Will_Weber · August 27, 2018, 2:28pm

Thanks for the response Magnus!

I'd definitely be up for considering an auto-detection method as well. Though I'm not sure which sets of plugins would potentially constitute that. Mind nudging me in the right direction?

magnusbaeck · August 27, 2018, 2:34pm

I was thinking regexp conditions in an if statement (wrapping a mutate filter that adds fields and/or tags) or a grok filter if you want to perform parsing and detection in one swoop.

Will_Weber · August 30, 2018, 3:30am

Thanks for the feedback!

system · September 27, 2018, 3:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.