Having problems parsing data

Hi there,

Im having some issues filtering this section of a log file,

2021-01-19T13:32:25.263Z localhost {reason=user_approved, txid=cbcf50e8-e05e-4ee8-9c6d-125d78b6ff9e, ood_software=null, isotimestamp=2021-01-19T13:29:11.720467+00:00, result=success, access_device={hostname=null, is_password_set=unknown, flash_version=null, os=null, os_version=null, browser=null, ip=0.0.0.0, java_version=null, location={country=uk, city=london, state=null}, browser_version=null, is_firewall_enabled=unknown, is_encryption_enabled=unknown}, event_type=authentication, application={name=prod1, key=28374638900}

i think the issues lay with the { characters as when i run this as a grok pattern

%{TIMESTAMP_ISO8601:date}%{SPACE}%{WORD:host} \{%{GREEDYDATA:data1}

Everything upto the { is

  "date": "2021-01-19T13:32:25.263Z",
  "host": "localhost",
  "data1": "reason=user_approved, txid=cbcf50e8-e05e-4ee8-9c6d-125d78b6ff9e, ood_software=null, isotimestamp=2021-01-19T13:29:11.720467+00:00, result=success, access_device={hostname=null, is_password_set=unknown, flash_version=null, os=null, os_version=null, browser=null, ip=0.0.0.0, java_version=null, location={country=uk, city=london, state=null}, browser_version=null, is_firewall_enabled=unknown, is_encryption_enabled=unknown}, event_type=authentication, application={name=prod1, key=28374638900}"

but i cant figure out how to parse the text from "reason" onwards.. plus there's other { characters in there i suspect are going to break something later on.

is there someway i can parse the data inside the GREEDYDATA pattern? or can i ignore the { characters?? or skip them somehow?

many thanks for your advice

I used GREEDYDATA for everything so you want to change those to more appropriate ones. This was also done in Kibana Dev Tools Grok Debugger so Logstash might require some escaping of special characters. This is just an example of how to accomplish what you are looking to do. (I think :slight_smile: )

Pattern

%{TIMESTAMP_ISO8601:date}%{SPACE}%{WORD:host}%{SPACE}{reason=%{GREEDYDATA:reason},%{SPACE}txid=%{GREEDYDATA:txid},%{SPACE}ood_software=%{GREEDYDATA:ood_software},%{SPACE}isotimestamp=%{GREEDYDATA:isotimestamp},%{SPACE}result=%{GREEDYDATA:result},%{SPACE}access_device={hostname=%{GREEDYDATA:access_device.hostname},%{SPACE}is_password_set=%{GREEDYDATA:access_device.is_password_set},%{SPACE}flash_version=%{GREEDYDATA:access_device.flash_version},%{SPACE}os=%{GREEDYDATA:access_device.os},%{SPACE}os_version=%{GREEDYDATA:access_device.os_version},%{SPACE}browser=%{GREEDYDATA:access_device.browser},%{SPACE}ip=%{GREEDYDATA:access_device.ip},%{SPACE}java_version=%{GREEDYDATA:access_device.java_version},%{SPACE}location={country=%{GREEDYDATA:access_device.location.country},%{SPACE}city=%{GREEDYDATA:access_device.location.city},%{SPACE}state=%{GREEDYDATA:access_device.location.state}},%{SPACE}browser_version=%{GREEDYDATA:access_device.browser_version},%{SPACE}is_firewall_enabled=%{GREEDYDATA:access_device.is_firewall_enabled},%{SPACE}is_encryption_enabled=%{GREEDYDATA:access_device.is_encryption_enabled}},%{SPACE}event_type=%{GREEDYDATA:event_type},%{SPACE}application={name=%{GREEDYDATA:application.name},%{SPACE}key=%{GREEDYDATA:application.key}}

Returns

{
  "date": "2021-01-19T13:32:25.263Z",
  "reason": "user_approved",
  "txid": "cbcf50e8-e05e-4ee8-9c6d-125d78b6ff9e",
  "ood_software": "null",
  "isotimestamp": "2021-01-19T13:29:11.720467+00:00",
  "result": "success",
  "access_device": {
    "is_password_set": "unknown",
    "hostname": "null",
    "os": "null",
    "flash_version": "null",
    "ip": "0.0.0.0",
    "browser": "null",
    "os_version": "null",
    "java_version": "null",
    "location": {
      "country": "uk",
      "city": "london",
      "state": "null"
    },
    "browser_version": "null",
    "is_firewall_enabled": "unknown",
    "is_encryption_enabled": "unknown"
  },
  "event_type": "authentication",
  "application": {
    "name": "prod1",
    "key": "28374638900"
  },
  "host": "localhost"
}

i found a different approach that i havn't tested in logstash yet, only in the debugger...

%{TIMESTAMP_ISO8601:date}%{SPACE}%{WORD:host} \{(?<drop>reason)=%{DATA:reason_action},%{SPACE}(?<drop>txid)=%{DATA:txid},%{SPACE}(?<drop>ood_software)=%{DATA:ood_software},%{SPACE}(?<drop>isotimestamp)=%{DATA:isotimestamp},%{SPACE}(?<drop>result)=%{DATA:result},%{SPACE}(?<drop>access_device)=%{DATA:device}\{(?<drop>hostname)=%{DATA:hostname},%{SPACE}(?<drop>is_password_set)=%{DATA:is_password_set},%{SPACE}(?<drop>flash_version)=%{DATA:flash_version},%{SPACE}(?<drop>os)=%{DATA:os_version},%{SPACE}(?<drop>os_version)=%{DATA:os},%{SPACE}(?<drop>browser)=%{DATA:browser},%{SPACE}(?<drop>ip)=%{IPORHOST:ip},%{SPACE}(?<drop>java_version)=%{DATA:java_version},%{SPACE}(?<drop>location)=\{(?<drop>country=)%{DATA:country},%{SPACE}(?<drop>city)=%{DATA:city},%{SPACE}(?<drop>state)=%{DATA:state}\},%{SPACE}(?<drop>browser_version)=%{DATA:browser_version},%{SPACE}(?<drop>is_firewall_enabled)=%{DATA:is_firewall_enabled},%{SPACE}(?<drop>is_encryption_enabled)=%{DATA:is_encryption_enabled},%{SPACE}(?<drop>event_type)=%{DATA:event_type},%{SPACE}(?<drop>application)=\{(?<drop>name=)%{DATA:application},%{SPACE}(?<drop>key)=%{DATA:key},%{SPACE}(?<drop>host)=%{DATA:host},%{SPACE}(?<drop>alias)=%{DATA:alias},%{SPACE}(?<drop>eventtype)=%{DATA:eventtype},%{SPACE}(?<drop>factor)=%{DATA:factor},%{SPACE}(?<drop>auth_device)=\{(?<drop>name=)%{DATA:auth_device},%{SPACE}(?<drop>location)=\{(?<drop>country=)%{DATA:country1},%{SPACE}(?<drop>city)=%{DATA:city1},%{SPACE}(?<drop>state)=%{DATA:state1}\},%{SPACE}(?<drop>ip)=%{IPORHOST:ip1}\},%{SPACE}(?<drop>user)=\{(?<drop>name=)%{USER:user1}

its a pretty horrendous approach, im sure not very efficient, but it seems to parse the debug

so a couple of things i noticed.... in my dataset i have two reports of location data, city, country, state etc... i dont know if both are going to be populated so i opted to use country and country1 to keep the fields separate incase data pops in, however some like username i dont thing i have to worry about, but interestingly i noticed in (?user) and (?user) it overwrites the first one, so rather than a doc with 30 fields, i cut in half by allowing the field data to be overwritten.... i think thats how its working... ill put it through some test data later and see how logstash fares

btw, yours is much cleaner than mine lol

I would not use grok. Transform the data you have into data that a different filter can parse...

    grok { match => { "message" => "%{TIMESTAMP_ISO8601:date}%{SPACE}%{WORD:host} {%{GREEDYDATA:data1}" } }
    mutate {
        gsub => [
            "data1", "([a-zA-Z0-9_+:\.\-]+)", '"\1"',
            "data1", "=", ": ",
            "data1", "^", "{",
            "data1", "$", "}"
        ]
    }
    json { source => "data1" }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.