Dissecting google logs textPayLoad

Hi here is my logstash config for the dissect

 filter {
    dissect {
      mapping => {
        "textPayload" => "%{something1} [%{something2} %{+something2}]  %{something3} %{something4} %{something5} %{something6} %{something7} %{something8} %{something9} %{something10} %{something11} %{something12} %{something13} %{something14}    }"
      }
    }
  }

The data is in the format off -

> INFO [2019-06-20 10:37:42,734] com.something.something.something.information.core.LoggingPiracyReporter: Informational request: ip_address="1.1.1.1" domain_name="domain.com" some_random_id="HrmwldM4DQNXoQF3AnYosJ0Mtig=" random_id_2="Isl/eC4ERnoLVEBMXYtWeMjwqkSKA2MPSsDnGHe4EzE=" number=1000 timestamp=1561027064 valid_token_present=true everything_ok=true [Http/1.1] [8.8.8.8, 8.8.8.8, 8.8.8.8]

I just want the ip address/domain name out of this payload I can't seem to get grok or dissect to work can anyone suggest how to do this?

You can do it with dissect using

dissect { mapping => { "message" => '%{} ip_address="%{ip}" domain_name="%{name}"%{}' } }

Or, more expensively using grok with

grok { match => { "message" => 'ip_address="%{IPV4:ip}" domain_name="%{HOSTNAME:name}"' } }

Hi

Thanks for this

This works, however it throws a warning in the logs

[2019-07-01T14:08:00,617][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found {"field"=>"textPayload", "pattern"=>"%{} ip_address=\"%{ip}\" domain_name=\"%{name}\"%{}", "event"=>{"insertId"=>"1xnceofg199f39u", "@timestamp"=>2019-07-01T13:08:00.397Z, "labels"=>{"container.googleapis.com/stream"=>"stdout", "compute.googleapis.com/resource_name"=>"fluentd-gcp-v3.2.0-f4hpp", "container.googleapis.com/pod_name"=>"service-info-56dd7f4b88-q68m2", "container.googleapis.com/namespace_name"=>"default"}, "logName"=>"loglocation/loglocation", "timestamp"=>"2019-06-21T19:04:23.653459387Z", "severity"=>"INFO", "receiveTimestamp"=>"2019-06-21T19:04:29.064943045Z", "tags"=>["_dissectfailure"], "@version"=>"1", "resource"=>{"labels"=>{"zone"=>"europe-west1-d", "namespace_id"=>"default", "cluster_name"=>"euw1d-kube-name", "container_name"=>"service-server", "instance_id"=>"5fffff47841", "pod_id"=>"service-info-56dd7f4b88-q68m2", "project_id"=>"project-id"}, "type"=>"container"}, "textPayload"=>"WARN [2019-06-21 19:04:23,653] org.eclipse.jetty.http.HttpParser: Illegal character 0x16 in state=START for buffer HeapByteBuffer@5b915ba[p=1,l=138,c=8192,r=137]={\\x16<<<\\x03\\x01\\x00\\x85\\x01\\x00\\x00\\x81\\x03\\x03\\xD1f\\xEf\\xDa)N\\r...\\x05\\x03\\x02\\x01\\x02\\x03\\xFf\\x01\\x00\\x01\\x00\\x00\\x12\\x00\\x00>>>-cache,no-store\\r\\n...\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00}\n"}}

I think that is just telling you that your textPayload field does not match the pattern that dissect is looking for.

This is what I put in my logstash config

filter {
dissect { mapping => { "textPayload" => '%{} ip_address="%{client_ip}" domain_name="%{domain_name}"%{}' } }
}

textPayload is where the "message" is. so it pulls out the fields client_ip and domain_name without any issues. So I don't understand why im getting a pattern doesnt exist error.

It does not match the pattern.

I am sorry.

I am not following, if it doesnt match the pattern how is dissect pulling out the client ip/domain name without issue?

It's saying dissect failure, but the dissect is working fine.

I do not believe that is possible. It will pull out the client ip and domain name when they occur in the textPayload field, but if they are not there it cannot extract them.

The fields are always there?

every log entry has them in the textPayload.

Occasionally the domain name is empty like - domain_name=""

But this warning is spamming my logs for every line by the looks of it.

No, they are not. The error message you posted contains a textPayload field and it does not contain ip_address or domain_name!

The textPayload field is always structured like -

INFO [2019-06-20 10:37:42,734] com.something.something.something.information.core.LoggingPiracyReporter: Informational request: ip_address="1.1.1.1" domain_name="domain.com" some_random_id="HrmwldM4DQNXoQF3AnYosJ0Mtig=" random_id_2="Isl/eC4ERnoLVEBMXYtWeMjwqkSKA2MPSsDnGHe4EzE=" number=1000 timestamp=1561027064 valid_token_present=true everything_ok=true [Http/1.1] [8.8.8.8, 8.8.8.8, 8.8.8.8]

No, it is not. Have a good day!

Ok

I think I've figured out why this is happening.

The domain name field is sometimes empty e.g

domain_name=""

Is there anyway to ignore this and get rid of warning?

If you use this to dissect a line like

"message" => "foo ip_address=\"1.2.3.4\" domain_name=\"\" stuff more stuff",

then you will get

        "ip" => "1.2.3.4",
      "name" => "",

dissect has no problem with empty fields.

Ok.

Well, then I'm not understanding this error.

I am using this -

dissect { mapping => { "textPayload" => '%{} ip_address="%{client_ip}" domain_name="%{domain_name}"%{}' } }

And every line in the logfile has a domain_name/ ip_address part. They are all formatted like -

INFO [2019-06-20 10:37:42,734] com.something.something.something.information.core.LoggingPiracyReporter: Informational request: ip_address="1.1.1.1" domain_name="domain.com" some_random_id="HrmwldM4DQNXoQF3AnYosJ0Mtig=" random_id_2="Isl/eC4ERnoLVEBMXYtWeMjwqkSKA2MPSsDnGHe4EzE=" number=1000 timestamp=1561027064 valid_token_present=true everything_ok=true [Http/1.1] [8.8.8.8, 8.8.8.8, 8.8.8.8]

That filter will result in

        "ip" => "1.1.1.1",
      "name" => "domain.com",
   "message" => "INFO [2019-06-20 10:37:42,734] com.something.something.something.information.core.LoggingPiracyReporter: Informational request: ip_address=\"1.1.1.1\" domain_name=\"domain.com\" some_random_id=\"HrmwldM4DQNXoQF3AnYosJ0Mtig=\" random_id_2=\"Isl/eC4ERnoLVEBMXYtWeMjwqkSKA2MPSsDnGHe4EzE=\" number=1000 timestamp=1561027064 valid_token_present=true everything_ok=true [Http/1.1] [8.8.8.8, 8.8.8.8, 8.8.8.8]",

Which it does. That happens, in my discover I can see IP/Name fields ect ect and they are 100% working/there.

Its the warning thats filling up my sys logs with 20-30gb of lines in the space of an hour that is concerning/I cant explain. If the filter works, why the warnings?

Well, the example of the message in your first post does not contain either ip_address or domain. You don't seem to be getting that.

As the documentation says, you may need a conditional to check that the line will match the pattern.

 if [textPayload] =~ /ip_address="/ and [textPayload] =~ /domain_name="/ {
        dissect { mapping => { "textPayload" => '%{} ip_address="%{ip}" domain_name="%{name}"%{}' } }
    }

First of all, thank you for your help/patience.

But I am still totally confused.

The example in my first post is -

> INFO [2019-06-20 10:37:42,734] com.something.something.something.information.core.LoggingPiracyReporter: Informational request: ip_address="1.1.1.1" domain_name="domain.com" some_random_id="HrmwldM4DQNXoQF3AnYosJ0Mtig=" random_id_2="Isl/eC4ERnoLVEBMXYtWeMjwqkSKA2MPSsDnGHe4EzE=" number=1000 timestamp=1561027064 valid_token_present=true everything_ok=true [Http/1.1] [8.8.8.8, 8.8.8.8, 8.8.8.8]

which has an ip_address="1.1.1.1" and domain_name="domain.com"?
every single line of JSON has them two fields like that in the textPayload field.

I've used the dissect filter plenty of times before and never had any issues, but it is not behaving like i'd expect.

I'd of thought i could of just done %{field1} %{field2} ect ect using the space as the deliminator, howerver this all breaks it.

I've tried to below also (As i actually want the majority of the fields out of it)

dissect { mapping => { "message" => '%{} ip_address="%{ip}" domain_name="%{name}" some_random_id="%{some_random_id}" random_id_2="%{random_id_2}" number="%{number}"%{}' } }

Now this works, but the number="%{number}" breaks it (if i remove the number part, it works fine) however still get the warning in the logstash logs.

Sorry, my mistake. Look at the example data you posted in the 3rd post in this thread. The textPayload, that starts with WARN, does not contain ip_address or domain_name.