Cannot parse suricata.eve.http.content_range

I'm using the suricata module from beats 7.17.3 to parse suricata 6.0.4 EVE json logs and I'm getting the following parse error:

"error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range]
as object, but found a concrete value"
"content_range"=>"bytes 0-859529/859530"

There does not appears to be a mapping for suricata.eve.http_content_range in the filebeat template.

Looks like content_range can be duplicated, e.g.:

   "http":{
      "hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com",
      "url":"/filestreamingservice/files/27c205c0-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d",
      "http_user_agent":"Microsoft BITS/7.8",
      "xff":"127.0.0.1",
      "http_content_type":"application/x-chrome-extension",
      "content_range":{
         "raw":"bytes 1140-1205/23709",
         "start":1140,
         "end":1205,
         "size":23709
      },
      "accept":"*/*",
      "accept_encoding":"identity",
      "cache_control":"max-age=0",
      "range":"bytes=1140-1205",
      "age":"2400730",
      "content_length":"66",
      "content_range":"bytes 1140-1205/23709",

Which really isn't good to have. I guess I'll ping the suricata folks.

It's been reported: Bug #5320: Key collisions in HTTP JSON eve-logs - Suricata - Open Information Security Foundation

@opoplawski did you run filebeat setup -e before you ingested any data?
EDIT Ohh I looked at your bug that is not good!

Thinking about a fix / hack to the the pipeline...

Yikes I can not even easily simulate because that is not valid jason syntax!

I'm cleaning things up in logstash:

# suricata filebeat module
filter {
  if "suricata" in [tags] {
    mutate {
      # https://redmine.openinfosecfoundation.org/issues/5320
      # "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range] as object, but found a concrete value"}}}}
      remove_field => [ "[suricata][eve][http][content_range]" ]
    }
  }
}

Does that work?... how does it know which to remove or are you removing both for now?

I guess I'm not sure yet - need to wait for an event to occur again. Maybe throw in two remove_fields to get both. I really don't particularly care about content_range here.

you got my interest... if may try to force one in through and see what happens... :slight_smile: if I get a little time.

Soooo

Where did you get this from is that out of filebeat? I am surprised it will actually output that.

So what I think is going to happen according to what I see is that whichever content_range is defined last the string or the object will be in the json because the json decoder will do that ... then you will drop the content_range everytime...

i.e. I ran test with "content_range":"bytes 1140-1205/23709" and "content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709} in different orders and the json that ends up after it is decoded in logstash is what ever is the last one...

SO I think if you wanted to keep most of the "content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709} you could check if

"[suricata][eve][http][content_range][raw]" exists and if does not...then drop the field.

Here is my test

Data File

{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","content_range":{"raw":"bytes 1140-1205/23709","start":1140,"end":1205,"size":23709},"url":"/filestreamingservice/files/27c205c0-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66"}}
{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","content_range":{"raw":"bytes 1140-1205/23709","start":1140,"end":1205,"size":23709},"url":"/filestreamingservice/files/88888888-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66"}}
{"http":{"hostname":"msedge.b.tlu.dl.delivery.mp.microsoft.com","url":"/filestreamingservice/files/88888888-4d23-4061-a4dc-efff6a98e2e1?P1=1664827696&P2=404&P3=2&P4=MMn%2fpKLx4Eq88yAKJTbbGwv%2bKwbRxxVmMpR9A6OasnYmOY5Lq7jyUuf3rr7d1sxATsv%2fPP3W0ZnCsCOSymhbbw%3d%3d","http_user_agent":"Microsoft BITS/7.8","xff":"127.0.0.1","http_content_type":"application/x-chrome-extension","content_range":"bytes 1140-1205/23709","accept":"*/*","accept_encoding":"identity","cache_control":"max-age=0","range":"bytes=1140-1205","age":"2400730","content_length":"66","content_range":{"raw":"bytes 1111-1205/23709","start":1140,"end":1205,"size":23709}}}

Then I just use beats to read and send to logstash...

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /Users/sbrown/workspace/elastic-install/7.17.3/logstash-7.17.3/config/bad-json.json

output.logstash:
  hosts: ["localhost:5044"]
input {
	beats {
		port => 5044
		codec => "json"
	}
}


##
# filter {
#     mutate {
#       # https://redmine.openinfosecfoundation.org/issues/5320
#       # "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [suricata.eve.http.content_range] tried to parse field [content_range] as object, but found a concrete value"}}}}
#       remove_field => [ "[http][content_range]" ]
#     }
# }

output{
  stdout {}
}

The you will see in logstash the codec keeps the last... so unfortunately is if that is what the message looks like above you will always get the 2nd instance of content_range

NOW all that said ... now what I think is happening is that occasionally you are getting a message that does NOT have the 2nd string content_range and then when it tries to write the content_range Object it fails. or vice versa.

SO I think you could detect the object and turn it back into the string... or parse the string every-time and make it the content_range object.

And since the mapping appears to be dynamic... which ever type arrives first wins the mapping!

According to the very top error.. the mapping is an content_range Object.... so you need to decide what to do ... unfortunately I think the the json decoder is just going to pick the last one each time... seems like you an object first... you could put in code to detect if is a string then parse yourself.

Or detect if a string / not object and just rename it.

Apologies that was a lot... but interesting problem :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.