Umbrella S3 logs weird format when using CSV filter

danielpaiva · October 9, 2024, 7:39am

Hello !

I'm currently working on pulling Cisco Umbrella logs from S3 buckets with Logstash and s3 input and I'm dealing with a weird behavior.

When using only the s3 input and sending the logs to Elastic, it works like a charm. But as soon as I want to use a CSV filter to parse the logs, it looks like the charset is wrong and failed to parse the logs.

Using an other pipeline to do the parsing works with pipeline to pipeline communication. Although for whatever reason, I need to build the field @timestamp myself because the month is wrong.

I have a workaround so it isn't a rush but it isn't ideal as I would like to have the parsing in the same pipeline.

Here is my S3 input:

input {
    s3 {
        access_key_id => "${s3_access_key_id}"
        id => "cisco_umbrella_aws_s3_bucket"
        bucket => "${s3_bucket}"
        region => "${s3_region}"
        secret_access_key => "${s3_secret_access_key}"
        prefix => "${s3_prefix}/dnslogs"
        add_field => {
            "[log_category]" => "network"
            "[log_subcategory]" => "dns"
            "[log_vendor]" => "cisco"
            "[log_product]" => "umbrella"
        }
    }
}

Here is filter part of my pipeline for DNS logs:

filter {
  csv {
    columns => ["[cisco][umbrella][_tmp][time]",
    "[cisco][umbrella][identity]",
    "[cisco][umbrella][identities]",
    "[source][address]",
    "[source][nat][ip]",
    "[cisco][umbrella][action]",
    "[dns][question][type]",
    "[dns][response_code]",
    "[dns][question][name]",
    "[cisco][umbrella][categories]",
    "[cisco][umbrella][policy_identity_type]",
    "[cisco][umbrella][identity_types]",
    "[cisco][umbrella][blocked_categories]"]
    id => "cisco_umbrella_dns_parsing_csv"
  }
  mutate {
    rename => {
      "[cisco][umbrella][_tmp][time]" => "event_creation_time"
    }
    split => {
      "[cisco][umbrella][identities]" => ","
      "[cisco][umbrella][identity_types]" => ","
      "[cisco][umbrella][categories]" => ","
      "[cisco][umbrella][blocked_categories]" => ","
    }
    remove_field => ["[cisco][umbrella][_tmp]"]
    id => "cisco_umbrella_dns_mutate"
  }
  date {
    match => ["event_creation_time", "yyyy-MM-dd HH:mm:ss"]
    id => "cisco_umbrella_dns_date"
  }
  mutate {
    add_field => {
      "[event][action]" => "dns-request-%{[cisco][umbrella][action]}"
      "[observer][type]" => "dns"
    }
    id => "cisco_umbrella_ecs_compliance_mutate"
  }
}

To show you, here is a log with correct format when using only the input:

"2024-10-09 07:19:57","John Doe (JohnDoe@mycompany.com)","John Doe (JohnDoe@mycompany.com),Default Site,DEVICENAME,SITE","10.10.10.10","1.1.1.1","Allowed","1 (A)","NXDOMAIN","mydomain.dns.domain.","","Group","Goup,Sites,and other,stuff",""

Here is a part of a log a soon as I put the csv filter:

1tList2d (A)",0:00","JohnIllow Doe"Group,erIllow Lis"Group,erIl3es,and other stuff,stuff",""

It seems like the charset was changed and the input can't anymore delimit the logs correctly. Has anyone already seen this behavior ?

Last piece of information, I'm running Logstash on version 8.13.4.

Thanks in advance for your help !

Daniel

Topic		Replies	Views
S3 csv data does not reach to elasticsearch Logstash	3	434	August 28, 2019
Question about s3 output plugin Logstash	5	1065	April 13, 2018
S3 plugin file collision Logstash	1	612	January 8, 2018
Problem with S3 Input Logstash	4	1599	August 15, 2018
Logstash Filter help - Newbie question Logstash	6	315	February 17, 2022

Umbrella S3 logs weird format when using CSV filter

Related topics