XML multiline files

Hi everyone! my name is Matias Aguero. and i need ingest a lot XML files in elasticsearch, but i can't do it. My XMLs files then this structure:

    <?xml version="1.0" encoding="utf-8"?>
    <detailedreport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="https://www.veracode.com/schema/reports/export/1.0" xsi:schemaLocation="https://www.veracode.com/schema/reports/export/1.0 https://analysiscenter.veracode.com/resource/detailedreport.xsd" report_format_version="1.5" account_id="" app_name="" app_id="" analysis_id="" static_analysis_unit_id="" sandbox_name="" sandbox_id="" first_build_submitted_date="" version="" build_id="" submitter="" platform="" assurance_level="" business_criticality="" generation_date="" veracode_level="" total_flaws="" flaws_not_mitigated="" teams="" life_cycle_stage="" planned_deployment_date="" last_update_time="" is_latest_build="" policy_name="" policy_version="" policy_compliance_status="" policy_rules_status="" grace_period_expired="" scan_overdue="" business_owner="" business_unit="" tags="" legacy_scan_engine="">
      <static-analysis rating="" score="" submitted_date="" published_date="" version="" analysis_size_bytes="" engine_version="">
          <module name="" compiler="" os="" architecture="" loc="" score="" numflawssev0="" numflawssev1="" numflawssev2="" numflawssev3="" numflawssev4="" numflawssev5="" />
      <severity level="5" />
      <severity level="4" />
      <severity level="3">
        <category categoryid="" categoryname="" pcirelated="">
            <para text="" />
            <para text="" />
            <para text="" />
          <cwe cweid="" cwename="" pcirelated="" owasp="" owasp2013="" sans="" owaspmobile="" certjava="">
              <text text="" />
              <flaw severity="" categoryname="" count="" issueid="" module="" type="" description="" note="" cweid="" remediationeffort="" exploitLevel="" categoryid="" pcirelated="" date_first_occurrence="" remediation_status="" cia_impact="" grace_period_expires="" affects_policy_compliance="" mitigation_status="" mitigation_status_desc="" sourcefile="" line="" sourcefilepath="" scope="" functionprototype="" functionrelativelocation="" />
      <severity level="2" />
      <severity level="1" />
      <severity level="0" />
      <flaw-status new="" reopen="" open="" cannot-reproduce="" fixed="" total="" not_mitigated="" sev-1-change="" sev-2-change="" sev-3-change="" sev-4-change="" sev-5-change="" />
        <customfield name="Custom 1" value="" />
        <customfield name="Custom 2" value="" />
        <customfield name="Custom 3" value="" />
        <customfield name="Custom 4" value="" />
        <customfield name="Custom 5" value="" />
        <customfield name="Custom 6" value="" />
        <customfield name="Custom 7" value="" />
        <customfield name="Custom 8" value="" />
        <customfield name="Custom 9" value="" />
        <customfield name="Custom 10" value="" />
      <software_composition_analysis third_party_components="" violate_policy="" components_violated_policy="">
          <component component_id="" file_name="" sha1="" vulnerabilities="" max_cvss_score="" version="" library="" vendor="" description="" component_affects_policy_compliance="" new="">
              <file_path value="" />
              <license name="" spdx_id="" license_url="" risk_rating="" />
            <vulnerabilities />
            <violated_policy_rules />

What have you tried and what did not work?

Hi! I try with this logstash pipeline:

	    path => "/veracode/*.xml"
	    type => "xml"
        start_position => "beginning"
	    sincedb_path => "/dev/null"
        codec => multiline {
            pattern => "<"
            negate => true
            what => "previous"
filter { 
xml {
    source => "message"
    store_xml => false
    target => "xml"

output { 
stdout { 
    codec => rubydebug
elasticsearch {
        action => "index"  
        hosts => ["elasticsearch:9200"]
        index => "veracode" 

If you set store_xml to false and do not include the xpath option then the xml filter is a no-op.

Does your multiline codec work? Do you see each document in elasticsearch being a complete XML document?

Are you trying to consume the whole of each file as a single event? (You cannot parse multiple XML documents from a single event.)

Honestly, I never used the filter for xml, now it is clearer to me.

Yes, I consider that each XML document is a particular event and I would like to analyze each one as such, a separate document. How should I proceed to do this?

If your multiline codec is working then just change store_xml to be true.

However, it is very unlikely that the codec is working since it says to combine lines that do not contain <, and you do not have any lines like that.

Once again, are you trying to consume the whole of each file as a single event? If so, you can do this.

Hi Badger! i tryed do it. and ingest a completly XML in the field "message" like this:

i need ingest each value like a json file. Basically i need make customs dashboards for my team and need filter for each value in the XML. You think i should convert the documents in JSON for this? How i should proceed?
Thanks for all!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.