Date Filter on Unix Epoch Failing

The date filter fails when attempting to match a unix epoch integer field and I can't figure out why.

The data ingested is from xml using the following logstash config:

input {
    file {
        path => [ "/home/ben/dmarc-reports/AvidXchange.com!domain.com!1586232003!1586318403.xml" ]
        start_position => "beginning"
        mode => "read"
        sincedb_path => "/home/ben/.sincedb-dmarctest"
        exit_after_read => true
        file_completed_action => "log"
        file_completed_log_path => "/home/ben/dmarccompleted.log"
        codec => multiline {
            pattern => "<feedback>"
            negate => "true"
            what => "previous"
        }
    }
}

filter {
    xml {
        source => "message"
        target => "parsed_xml"
        store_xml => false
        xpath => [
            "/feedback/report_metadata/org_name/text()", "reporting_org",
            "/feedback/report_metadata/report_id/text()", "report_id",
            "/feedback/report_metadata/date_range/begin/text()", "report_start",
            "/feedback/report_metadata/date_range/end/text()", "report_end",
            "/feedback/record/row/source_ip/text()", "email_server_ip",
            "/feedback/record/row/policy_evaluated/dkim/text()", "policy_dkim",
            "/feedback/record/row/policy_evaluated/spf/text()", "policy_spf",
            "/feedback/auth_results/dkim/result/text()", "auth_dkim",
            "/feedback/auth_results/spf/result/text()", "auth_spf"
        ]
    }
    mutate {
        convert => {
            "report_start" => "integer"
            "report_end" => "integer"
        }
    }
    date {
        match => [ "report_start", "UNIX", "UNIX_MS" ]
        #target => "report_start"
    }
    date {
        match => [ "report_end", "UNIX", "UNIX_MS" ]
        target => "report_end_time"
    }
    dns {
        reverse => [ "email_server_ip" ]
    }
    if '<?xml version="1.0" encoding="UTF-8" ?>' in [message] { drop {} }
}

output {
    elasticsearch {
        index => "logstash_dmarcxml_%{+YYYY.MM.dd}"
    }
}

Also does anyone know a list of of xpath functions that logstash accepts?

Hello @ben-10

Would it be possible to replace the elasticsearch { } output with stdout { codec => rubydebug } so you can troubleshoot which is the actual value on report_start and report_end ?

Regarding XPath, Logstash uses the Ruby library https://github.com/sparklemotion/nokogiri and should support a wide range of XPath expressions. See documentation.

I would also suggest to move the following line to the beginning of the pipeline (so it avoids going through the other filters).

if '<?xml version="1.0" encoding="UTF-8" ?>' in [message] { drop {} }

Even better, I would write the multiline codec to include <?xml version="1.0" encoding="UTF-8" ?> (if it is repeated for every XML root.

Also, please upgrade the XML filter to version 4.1.0 as it includes a fix for a memory leak in case of malformed XML:

Thanks for the tip on upgrading the plugin! And I moved the conditional statement higher up the list. Here's the snippet of output for those two fields from the ruby debug:

       "report_start" => [
        [0] 1586232003
    ],
         "report_end" => [
        [0] 1586318403
    ],

Also forgot to mention the functions link on that page going to w3c is dead.

The fields you want to convert are arrays of strings, not strings. I would try something like this:

date {
    match => [ "report_end[0]", "UNIX", "UNIX_MS" ]
    target => "report_end_time"
}

Was worth a shot, but that came with:

Invalid FieldReference: `report_start[0]`

Why would that field be an array of strings after mutating it into an integer before running the date filter?

Sorry, it seems to be an array of integers. Well, only one integer actually, but still an array.

No worries, with XML documents I feel like I know nothing. I foresee my day filed with reading technical documents :nerd_face:

Try [report_start][0]

2 Likes

And that worked!

I also created a template in elasticsearch with static mappings to cast the "report_start" and "report_end" as integers instead of relying on logstash to mutate and convert. Also create target fields "report_start_time" and "report_end_time" mapped as dates.

The last time I tried all this was back in the 5.x stack storing the xml into sub-fields instead of using xpaths. Lots of new things to learn! Thank you for all the help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.