Filebeat send snippet of xml

Hey! im a new in ELK stack, and I have a issue trying to send a xml to Logstash.

The xml I want to send is more than 280k lines, but im only interested in send from one xpath to the end of the file.

How should I configure my filebeats.input in order to achieve this?

filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    enabled: true
    paths:
      - /home/ansible/ansible_openscap/oscap-reports/*.xml

    parsers:
      - multiline:
          type: pattern
          pattern: '^<\?xml*'
          flush_pattern: '^[\S]*<\/Benchmark>'
          negate: true
          match: after
          max_lines: 100000000000000
        close_eof: true

At the moment it looks something like that, I manage to send a good section of the xml file, but not the entire one, I was able to seen this via Logstash output.

I also need to process some idref, and other tags inside some xpaths.

Thanks!

Hi Keldane!

As I understand you are interested in a few xpaths from your xml. Filebeat multiline does not seem to be very promising to handle this. Couldn't you send the complete file to logstash and handle the parsing in logstash using the xml filter?

That's the idea, but we have to use filebeat to send the file, we can't just send it over a scp to logstash.

Does one logfile always contain exactly one valid XML? I guess then you should just remove the flush_pattern you defined.
Otherwise can you share a minimal example of your logfile and the parts that you receive in logstash?

Im not very familiar to xml, but yeah, I think so.

here is a little snippet of xml

<?xml version="1.0" encoding="UTF-8"?>
<Benchmark xmlns="http://checklists.nist.gov/xccdf/1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="xccdf_org.ssgproject.content_benchmark_RHEL-9" resolved="1" xml:lang="en-US" style="SCAP_1.2">
  <status date="2024-05-16">draft</status>
  <title xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en-US">Guide to the Secure Configuration of Red Hat Enterprise Linux 9</title>
  <description xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en-US">This guide presents a catalog of security-relevant
configuration settings for Red Hat Enterprise Linux 9. It is a rendering of
content structured in the eXtensible Configuration Checklist Description Format (XCCDF)
in order to support security automation.  The SCAP content is
is available in the <html:code xmlns:html="http://www.w3.org/1999/xhtml">scap-security-guide</html:code> package which is developed at
#
#.....
#
# Over 240k lines of xml 
#
#.....
#
<TestResult id="xccdf_org.open-scap_testresult_xccdf_org.ssgproject.content_profile_ospp_customized" start-time="2024-05-21T15:25:40+01:00" end-time="2024-05-21T15:26:43+01:00" version="0.1.73" test-system="cpe:/a:redhat:openscap:1.3.10">
    <benchmark href="/usr/share/xml/scap/ssg/content/ssg-rhel9-ds.xml" id="xccdf_org.ssgproject.content_benchmark_RHEL-9"/>
    <title>OSCAP Scan Result</title>
    <identity authenticated="false" privileged="false">root</identity>
    <profile idref="xccdf_org.ssgproject.content_profile_ospp_customized"/>
    <target>rh-oscap-1</target>
    <target-address>127.0.0.1</target-address>
    <target-address>192.168.1.100</target-address>
    <target-address>0:0:0:0:0:0:0:1</target-address>
    <target-address>fd8c:1eef:18b2:2645:be24:11ff:fef4:c5a4</target-address>
    <target-address>fe80:0:0:0:be24:11ff:fef4:c5a4</target-address>
    <target-facts>
      <fact name="urn:xccdf:fact:scanner:name" type="string">OpenSCAP</fact>
      <fact name="urn:xccdf:fact:scanner:version" type="string">1.3.10</fact>
      <fact name="urn:xccdf:fact:asset:identifier:fqdn" type="string">rh-oscap-1</fact>
    </target-facts>
    <platform idref="#package_sudo"/>
    <platform idref="#package_dnf"/>
    <platform idref="cpe:/o:redhat:enterprise_linux:9"/>
    <platform idref="#machine"/>
    <set-value idref="xccdf_org.ssgproject.content_value_var_aide_scan_notification_email">root@localhost</set-value>

#
# Other 500 lines of set_fact and rule-result
#

    <rule-result idref="xccdf_org.ssgproject.content_rule_audit_perm_change_success_ppc64le" role="full" time="2024-05-21T15:26:43+01:00" severity="medium" weight="1.000000">
      <result>notselected</result>
      <ident system="https://ncp.nist.gov/cce">CCE-86002-3</ident>
    </rule-result>
    <rule-result idref="xccdf_org.ssgproject.content_rule_audit_rules_for_ospp" role="full" time="2024-05-21T15:26:43+01:00" severity="medium" weight="1.000000">
      <result>notselected</result>
      <ident system="https://ncp.nist.gov/cce">CCE-85991-8</ident>
    </rule-result>
    <score system="urn:xccdf:scoring:default" maximum="100.000000">48.611111</score>
  </TestResult>
  
</Benchmark>

Here is the first log after filebeats stops reading the file:

{"log.level":"debug","@timestamp":"2024-05-28T12:28:53.710Z","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*logFile).Read","file.name":"filestream/filestream.go","file.line":131},"message":"End of file reached: /home/ansible/ansible_openscap/oscap-reports/rh-oscap-1.xml; Backoff now.","service.name":"filebeat","id":"my-filestream-id","source_file":"filestream::my-filestream-id::native::695959-64512","path":"/home/ansible/ansible_openscap/oscap-reports/rh-oscap-1.xml","state-id":"native::695959-64512","ecs.version":"1.6.0"}

it seems like filebeats found a eof, but it's only less than a half of the file...

The part that I recieve in Logstash can be something like 10-20k lines, so it's a bit complex to paste it here

Edit:

with some research in the xml, I found that filebeats finds a eof here:

     # Build full_rule while avoid adding double spaces when other_filters is empty
        if [ "${#syscall_a[@]}" -gt 0 ]
        then
            syscall_string=""
            for syscall in "${syscall_a[@]}"
            do
#here                syscall_string+=" -S $syscall"
            done
        fi

Yes is a part of message in the xml.

Update, the issue is not a eof character in the xml file.

2024-05-28 16:54:38.965305489 +0000 UTC m=+6.335550826 write error: data size (11554286 bytes) is greater than the max file size (10485760 bytes)

Since my config file looks like:

filebeat.inputs:
  - type: filestream
    id: xml-oscap
    enabled: true
    encoding: utf-8
    #message_max_bytes: 20971520
    max_bytes: 20971520
    paths:
      - /home/ansible/ansible_openscap/oscap-reports/*.xml
    parsers:
      - multiline:
          type: pattern
          pattern: '^<\?xml*'
          #flush_pattern: '^[\S]*<\/Benchmark>'
          negate: true
          match: after
          max_lines: 1000000000
      #     max_bytes: 20971520
      #   close_eof: true

I tries both parameters message_max_bytes and max_bytes

But i receive the same error every time I run filebeat

I have no experience with such large files so I cannot really help.
The documentation states you need message_max_bytest for the filestream-input: filestream input | Filebeat Reference [8.13] | Elastic

Did you restart filebeat after changing the config?
Hope you fix this!

I tried both of them multiple times, max_bytes and message_max_bytes

And yeah, I’m not using filebeat as daemon, I use it with this command:

filebeat -c /etc/filebeat/filebeat.yml

Which should load the filebeat using the yaml i set in the command

Also I achieve to send the files to Logstash, but it keeps dropping me the same error

write error: data size (11554286 bytes) is greater than the max file size (10485760 bytes)

That’s kinda always, because finally got it running but still got the same error prompt