Cannot parse suricata.eve.http.content_range

opoplawski · September 27, 2022, 5:31pm

I'm using the suricata module from beats 7.17.3 to parse suricata 6.0.4 EVE json logs and I'm getting the following parse error:

"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range]
as object, but found a concrete value"

"content_range"=>"bytes 0-859529/859530"

There does not appears to be a mapping for suricata.eve.http_content_range in the filebeat template.

opoplawski · September 27, 2022, 8:21pm

Looks like content_range can be duplicated, e.g.:

   "http":{
      "hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com",
      "url":"/filestreamingservice/files/27c205c0-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d",
      "http_user_agent":"Microsoft BITS/7.8",
      "xff":"127.0.0.1",
      "http_content_type":"application/x-chrome-extension",
      "content_range":{
         "raw":"bytes 1140-1205/23709",
         "start":1140,
         "end":1205,
         "size":23709
      },
      "accept":"*/*",
      "accept_encoding":"identity",
      "cache_control":"max-age=0",
      "range":"bytes=1140-1205",
      "age":"2400730",
      "content_length":"66",
      "content_range":"bytes 1140-1205/23709",

Which really isn't good to have. I guess I'll ping the suricata folks.

opoplawski · September 27, 2022, 8:23pm

It's been reported: Bug #5320: Key collisions in HTTP JSON eve-logs - Suricata - Open Information Security Foundation

stephenb · September 27, 2022, 10:15pm

@opoplawski did you run filebeat setup -e before you ingested any data?
EDIT Ohh I looked at your bug that is not good!

Thinking about a fix / hack to the the pipeline...

Yikes I can not even easily simulate because that is not valid jason syntax!

opoplawski · September 27, 2022, 10:29pm

I'm cleaning things up in logstash:

# suricata filebeat module
filter {
  if "suricata" in [tags] {
    mutate {
      # https://redmine.openinfosecfoundation.org/issues/5320
      # "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range] as object, but found a concrete value"}}}}
      remove_field => [ "[suricata][eve][http][content_range]" ]
    }
  }
}

stephenb · September 27, 2022, 10:32pm

Does that work?... how does it know which to remove or are you removing both for now?

opoplawski · September 27, 2022, 10:37pm

I guess I'm not sure yet - need to wait for an event to occur again. Maybe throw in two remove_fields to get both. I really don't particularly care about content_range here.

stephenb · September 27, 2022, 10:39pm

you got my interest... if may try to force one in through and see what happens... if I get a little time.

stephenb · September 27, 2022, 11:13pm

Soooo

opoplawski:

   "http":{
      "hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com",
      "url":"/filestreamingservice/files/27c205c0-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d",
      "http_user_agent":"Microsoft BITS/7.8",
      "xff":"127.0.0.1",
      "http_content_type":"application/x-chrome-extension",
      "content_range":{
         "raw":"bytes 1140-1205/23709",
         "start":1140,
         "end":1205,
         "size":23709
      },
      "accept":"*/*",
      "accept_encoding":"identity",
      "cache_control":"max-age=0",
      "range":"bytes=1140-1205",
      "age":"2400730",
      "content_length":"66",
      "content_range":"bytes 1140-1205/23709",

Where did you get this from is that out of filebeat? I am surprised it will actually output that.

So what I think is going to happen according to what I see is that whichever content_range is defined last the string or the object will be in the json because the json decoder will do that ... then you will drop the content_range everytime...

i.e. I ran test with "content_range":"bytes 1140-1205/23709" and "content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709} in different orders and the json that ends up after it is decoded in logstash is what ever is the last one...

SO I think if you wanted to keep most of the "content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709} you could check if

"[suricata][eve][http][content_range][raw]" exists and if does not...then drop the field.

Here is my test

Data File

{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","content_range":{"raw":"bytes 1140-1205/23709","start":1140,"end":1205,"size":23709},"url":"/filestreamingservice/files/27c205c0-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66"}}
{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","content_range":{"raw":"bytes 1140-1205/23709","start":1140,"end":1205,"size":23709},"url":"/filestreamingservice/files/88888888-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66"}}
{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","url":"/filestreamingservice/files/88888888-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66","content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709}}}

Then I just use beats to read and send to logstash...

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /Users/sbrown/workspace/elastic-install/7.17.3/logstash-7.17.3/config/bad-json.json

output.logstash:
  hosts: ["localhost:5044"]

input {
	beats {
		port => 5044
		codec => "json"
	}
}


##
# filter {
#     mutate {
#       # https://redmine.openinfosecfoundation.org/issues/5320
#       # "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range] as object, but found a concrete value"}}}}
#       remove_field => [ "[http][content_range]" ]
#     }
# }

output{
  stdout {}
}

The you will see in logstash the codec keeps the last... so unfortunately is if that is what the message looks like above you will always get the 2nd instance of content_range

NOW all that said ... now what I think is happening is that occasionally you are getting a message that does NOT have the 2nd string content_range and then when it tries to write the content_range Object it fails. or vice versa.

SO I think you could detect the object and turn it back into the string... or parse the string every-time and make it the content_range object.

And since the mapping appears to be dynamic... which ever type arrives first wins the mapping!

According to the very top error.. the mapping is an content_range Object.... so you need to decide what to do ... unfortunately I think the the json decoder is just going to pick the last one each time... seems like you an object first... you could put in code to detect if is a string then parse yourself.

Or detect if a string / not object and just rename it.

Apologies that was a lot... but interesting problem

system · October 26, 2022, 1:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't get text on a START_OBJECT at Beats filebeat	1	504	June 15, 2020
Filebeat pipping Suricatas eve.json issues Beats filebeat	2	1444	April 1, 2020
I didnt get suricata eve.json Kibana	5	1577	August 29, 2019
Suricata module - no parsing for XFF field (x forward ip) Beats filebeat	5	491	January 12, 2021
Using the Filebeat Suricata Module for EVE-Logs in Syslog messages Beats beats-module , filebeat	1	944	October 7, 2020

Cannot parse suricata.eve.http.content_range

Related topics