Dealing with non standard log formats

I'm trying (really really trying) to make sense of the logs coming from isc-dhcpd, a DHCP server. Depending upon the stage of the DHCP request process, the format of the output is slightly different. i've got three working grok patterns but i cant work out how to stitch them together where anything other than the first pattern is considered. I've pretty much spent the entire day looking for a solution and i'm out of ideas at this point.

The grok patterns that pass in the debugger:

%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}

%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}

%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} for %{IP:OfferedIP} (%{IP:DHCPServerIP}) from %{MAC:SourceMAC} via %{NOTSPACE:Interface}

Is there any way of allowing a filter to fail, and then continue to the next filter without any action so the next filter can be evaluated for a match? There may be a totally different way to do this so please feel free to tell me i'm barking up the wrong tree here.

The below doesn't work but it does evaluate the first message type:

if [fields][type] == "dhcpd" {
    grok {
      match => {
        "message" => "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "message" => "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "message" => "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} for %{IP:OfferedIP} \(%{IP:DHCPServerIP}\) from %{MAC:SourceMAC} via %{NOTSPACE:Interface}"
       }
    }
    mutate {
      remove_field => [ "message" ]
    }
  }
}

Does something like this work?

grok {
  match => {
    "message" => [
      "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
      "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
      "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} for %{IP:OfferedIP} \(%{IP:DHCPServerIP}\) from %{MAC:SourceMAC} via %{NOTSPACE:Interface}"
    ]
  }
}

Thanks for the rapid reply Christian. Unfortunately it has not resolved the issue...

My .conf file now looks like this:

if [fields][type] == "dhcpd" {
  grok {
    match =>
      "message" => [
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} for %{IP:OfferedIP} \(%{IP:DHCPServerIP}\) from %{MAC:SourceMAC} via %{NOTSPACE:Interface}"
      ]
  }
  mutate {
    remove_field => [ "message" ]
  }
}

The events in Kibana have none of the expected fields and of course the message has been mutated out

I just spotted a missing pair of { } in my file as per your suggestion. I've since tried with these but i still get the grok parse failure:

new .conf below:

if [fields][type] == "dhcpd" {
  grok {
    match => {
      "message" => [
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}",
        "%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{WORD:DHCP_MessageType} for %{IP:OfferedIP} \(%{IP:DHCPServerIP}\) from %{MAC:SourceMAC} via %{NOTSPACE:Interface}"
      ]
    }
  }
  mutate {
    remove_field => [ "message" ]
  }
}

I'm really out of ideas at this point so any assistance here would be gratefully received. I've made progress outside of this task but really need find a way to parse out different formats from the same log. it sounds so simple but i just cant find the syntax to get it working. Community, please help!!

Below are the simplified grok patterns i'm currently working with:

"%{SYSLOGBASE} %{NOTSPACE:DHCPMessageType} for %{IP:OfferedIP} from %{MAC:SourceMAC} \(%{HOSTNAME:DHCPRequestHostname}\) via %{NOTSPACE:Interface}"

"%{SYSLOGBASE} %{NOTSPACE:DHCPMessageType} on %{IP:OfferedIP} to %{MAC:SourceMAC} via %{NOTSPACE:Interface}"

"%{SYSLOGBASE} %{NOTSPACE:DHCPMessageType} from %{MAC:SourceMAC} via %{NOTSPACE:Interface}"

First off, don't expect syslog to be easy as there is no such thing as "standard". A few things to think about...

  • It is unlikely that out of the box grok will get you there. You will probably need to create your own patterns.

  • Considering breaking the problem down into multiple chunks. For example, first focus on handling the basics of the syslog "header" - priority, timestamp, host/ip, process... and whatever weirdness various vendors and apps might throw at you. You might find this to be a good starting point: https://github.com/koiossian/synesis_lite_syslog It will at least deal with the basics and handling the header. You can then focus on the logged message.

  • You then have to have some logic that figures out what the message type is... which app, or device? Then you have to recognize which message from that app or device? Then you can start thinking about how you are going to handle that message.

Just to give you an idea... dnsmasq can send only half dozen different messages. To handle all the little variations of those messages and fully extract the value of the data, our pipeline is ~700 lines long, and that doesn't include all of the logic to deal with the header portion (which is common to all syslog messages).

I tell you all of this because you sound frustrated, which usually means that expectations were different than reality. If log management were easy it wouldn't be the multi-billion dollar market that it is. Indexing logs (Elasticsearch) and visualizing them (Kibana) is the easy part. Collecting, parsing, transforming and enriching logs, in all of the weird formats and structures that vendors use (and often don't document) is the really really hard part.

Having said all of that, you might have better luck getting help if provide a sample of your logs. Otherwise no one can do anything except guess.

2 Likes

Thanks for your assistance. I'd never heard of synesis, it looks like i'll be spending some time with it in the near future though. I agree with everything you've said, especially the lack of documentation surrounding logs from various vendors! Specifically, i'm trying to deal with the logs that come out of isc-dhcpd. The logs are simple enough to the human eye (which i know means nothing). There are essentially 4 variants i'm interested in, in 3 formats. the grok patterns above match all 4 in my testing. Some sample logs are below:

Mar 26 09:16:02 core2f dhcpd[39142]: DHCPDISCOVER from 04:a3:16:5d:9a:34 via eth0
Mar 26 09:16:03 core2f dhcpd[39142]: DHCPOFFER on 192.168.1.139 to 04:a3:16:5d:9a:30 (host-xqluye9) via eth0
Mar 26 09:16:03 core2f dhcpd[39142]: DHCPREQUEST for 192.168.1.139 (192.168.1.40) from 04:a3:16:5d:9a:30 (host-xqluye9) via eth0
Mar 26 09:16:03 core2f dhcpd[39142]: DHCPACK on 192.168.1.139 to 04:a3:16:5d:9a:30 (host-xqluye9) via eth0

For what it's worth. These are being sent to a separate file as you said in "syslog" format, as luck would have it, the header portion does match the %{SYSLOGBASE} pattern in logstash. I'm adding a field to these logs in filebeat which i'm selecting for when they arrive in logstash, so when they are being ingested they should be completely segregated

J

As it happens we have the same dhcpd logs in our lab, so I took a closer look. Unsurprisingly there is a bit of variation in the logs that you will have to construct your pipeline to handle.

I am showing only the "log message" portion of the syslog event. To make it easier to compare, I have normalized all of the values as follows:

  • Client IP: 10.10.1.111
  • Client MAC: 01:02:03:04:05:06
  • Client hostame: clihost
  • DHCP Server Interface: eth1
  • DHCP Server Interface IP: 10.10.1.1

These are the different variations I see in our environment:

DHCPDISCOVER: Client broadcast to find available DHCP servers.

DHCPDISCOVER from 01:02:03:04:05:06 via eth1
DHCPDISCOVER from 01:02:03:04:05:06 (clihost) via eth1

DHCPOFFER: Server response to client’s DHCPDISCOVER, offering configuration parameters.

DHCPOFFER on 10.10.1.111 to 01:02:03:04:05:06 via eth1
DHCPOFFER on 10.10.1.111 to 01:02:03:04:05:06 (clihost) via eth1

DHCPREQUEST: Client message to one or more servers to do any of the following:

  • Request parameters from one server and implicitly decline offers from other servers.
  • Confirm that a previously allocated address is correct after, for example, a system reboot.
  • Extend the lease of a network address.
DHCPREQUEST for 10.10.1.111 from 01:02:03:04:05:06 via eth1
DHCPREQUEST for 10.10.1.111 (10.10.1.1) from 01:02:03:04:05:06 via eth1
DHCPREQUEST for 10.10.1.111 from 01:02:03:04:05:06 (clihost) via eth1
DHCPREQUEST for 10.10.1.111 (10.10.1.1) from 01:02:03:04:05:06 (clihost) via eth1

DHCPREQUEST for 10.10.1.111 from 01:02:03:04:05:06 via eth1: ignored (not authoritative).
DHCPREQUEST for 10.10.1.111 from 01:02:03:04:05:06 (clihost) via eth1: ignored (not authoritative).
DHCPREQUEST for 10.10.1.111 (10.10.1.1) from 01:02:03:04:05:06 (clihost) via eth1: ignored (not authoritative).

DHCPACK: Server to client acknowledgment message containing configuration parameters, including a confirmed network address.

DHCPACK on 10.10.1.111 to 00:0e:c4:d0:69:39 via eth1
DHCPACK on 10.10.1.111 to d4:a3:3d:d7:5c:d3 (clihost) via eth1

DHCPINFORM: Client to server message requesting only local configuration parameters; client has an externally configured network address.

DHCPINFORM from 10.10.1.111 via eth1: not authoritative for subnet 10.10.1.0

DHCPNAK: Server to client negative acknowledgment indicating the client’s understanding of the network address is incorrect (for example, if the client has moved to a new subnet), or a client’s lease has expired.

DHCPNAK on 10.10.1.111 to 01:02:03:04:05:06 via eth1

DHCPRELEASE: Client to server message giving up the user of the network address and canceling the remaining time on the lease.

DHCPRELEASE of 10.10.1.111 from 01:02:03:04:05:06 via eth1 (not found)
DHCPRELEASE of 10.10.1.111 from 01:02:03:04:05:06 via eth1 (found)
DHCPRELEASE of 10.10.1.111 from 01:02:03:04:05:06 (clihost) via eth1 (found)

DHCPDECLINE: Client to server message indicating the network address is already being used.

# I didn't have any examples of this message type. I may try to force the generation of such a message when I get more time.

A few things should jump out here...

  1. Your sample data only includes 4 types of messages, but you need to be able to handle 8.
  2. We don't even have an example of one of the types, so we don't know what it looks like. There is also some variation within a type, which begs the question whether we have identified all variations?
  3. Sometimes MAC addresses are by themselves 01:02:03:04:05:06 and sometimes they are followed by the name reported by the client 01:02:03:04:05:06 (clihost).
  4. Sometimes the client IP is by itself 10.10.1.111 and sometimes it is followed by the IP of the interface of the DHCP server 10.10.1.111 (10.10.1.1).
  5. DHCPREQUEST messages will sometimes include additional information after the DHCP server's interface name eth1: ignored (not authoritative). (also note the trailing period which can sometimes be troublesome). Where you are using %{NOTSPACE:Interface} you will pickup the colon after the interface name if you don't account for it.

If I was to start developing this pipeline, I would probably handle each message type individually. You may be able to come up with a series of grok patterns that will handle them all in one filter, but there is a lot more we do with the data as we normalize to our data model/schema. So I would start with something like this...

(NOTE: after initial processing we have examples above in a field called log.message)

if [log][message] =~ /^DHCPACK .*/ {
    grok {
        # Grok focused only on DHCPACK here.
    }
    # Here is any additional logic to handle DCHPACK.

} else if [log][message] =~ /^DHCPREQUEST .*/ {
    grok {
        # Grok focused only on DHCPREQUEST here.
    }
    # Here is any additional logic to handle DHCPREQUEST.

} else if [log][message] =~ /^DHCPDISCOVER .*/ {
    grok {
        # Grok focused only on DHCPDISCOVER here.
    }
    # Here is any additional logic to handle DHCPDISCOVER.

} else if [log][message] =~ /^DHCPOFFER .*/ {
    grok {
        # Grok focused only on DHCPOFFER here.
    }
    # Here is any additional logic to handle DHCPOFFER.

} else if [log][message] =~ /^DHCPRELEASE .*/ {
    grok {
        # Grok focused only on DHCPRELEASE here.
    }
    # Here is any additional logic to handle DHCPRELEASE.

} else if [log][message] =~ /^DHCPINFORM .*/ {
    grok {
        # Grok focused only on DHCPINFORM here.
    }
    # Here is any additional logic to handle DHCPINFORM.

} else if [log][message] =~ /^DHCPNAK .*/ {
    grok {
        # Grok focused only on DHCPNAK here.
    }
    # Here is any additional logic to handle DHCPNAK.

} else if [log][message] =~ /^DHCPDECLINE .*/ {
    grok {
        # Grok focused only on DHCPDECLINE here.
    }
    # Here is any additional logic to handle DHCPDECLINE.

} else {
    # nothing matched above so put some generic logic and maybe add a tag indicating that it didn't match.
}

I hope you find this helpful. We will probably complete this and add it to our own library of integrations in the near future.

4 Likes

This is clearly a much better and more robust approach than i was taking. Thank you very much for taking the time to so thoroughly answer this question. I'll no doubt be able to handle the rest from here. I'd certainly like to see this included as part of the standard library (even if it was just to see how close i got from this point)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.