XML multiline files

Hi everyone! my name is Matias Aguero. and i need ingest a lot XML files in elasticsearch, but i can't do it. My XMLs files then this structure:

    <?xml version="1.0" encoding="utf-8"?>
    <detailedreport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="https://www.veracode.com/schema/reports/export/1.0" xsi:schemaLocation="https://www.veracode.com/schema/reports/export/1.0 https://analysiscenter.veracode.com/resource/detailedreport.xsd" report_format_version="1.5" account_id="" app_name="" app_id="" analysis_id="" static_analysis_unit_id="" sandbox_name="" sandbox_id="" first_build_submitted_date="" version="" build_id="" submitter="" platform="" assurance_level="" business_criticality="" generation_date="" veracode_level="" total_flaws="" flaws_not_mitigated="" teams="" life_cycle_stage="" planned_deployment_date="" last_update_time="" is_latest_build="" policy_name="" policy_version="" policy_compliance_status="" policy_rules_status="" grace_period_expired="" scan_overdue="" business_owner="" business_unit="" tags="" legacy_scan_engine="">
      <static-analysis rating="" score="" submitted_date="" published_date="" version="" analysis_size_bytes="" engine_version="">
        <modules>
          <module name="" compiler="" os="" architecture="" loc="" score="" numflawssev0="" numflawssev1="" numflawssev2="" numflawssev3="" numflawssev4="" numflawssev5="" />
        </modules>
      </static-analysis>
      <severity level="5" />
      <severity level="4" />
      <severity level="3">
        <category categoryid="" categoryname="" pcirelated="">
          <desc>
            <para text="" />
            <para text="" />
          </desc>
          <recommendations>
            <para text="" />
          </recommendations>
          <cwe cweid="" cwename="" pcirelated="" owasp="" owasp2013="" sans="" owaspmobile="" certjava="">
            <description>
              <text text="" />
            </description>
            <staticflaws>
              <flaw severity="" categoryname="" count="" issueid="" module="" type="" description="" note="" cweid="" remediationeffort="" exploitLevel="" categoryid="" pcirelated="" date_first_occurrence="" remediation_status="" cia_impact="" grace_period_expires="" affects_policy_compliance="" mitigation_status="" mitigation_status_desc="" sourcefile="" line="" sourcefilepath="" scope="" functionprototype="" functionrelativelocation="" />
            </staticflaws>
          </cwe>
        </category>
      </severity>
      <severity level="2" />
      <severity level="1" />
      <severity level="0" />
      <flaw-status new="" reopen="" open="" cannot-reproduce="" fixed="" total="" not_mitigated="" sev-1-change="" sev-2-change="" sev-3-change="" sev-4-change="" sev-5-change="" />
      <customfields>
        <customfield name="Custom 1" value="" />
        <customfield name="Custom 2" value="" />
        <customfield name="Custom 3" value="" />
        <customfield name="Custom 4" value="" />
        <customfield name="Custom 5" value="" />
        <customfield name="Custom 6" value="" />
        <customfield name="Custom 7" value="" />
        <customfield name="Custom 8" value="" />
        <customfield name="Custom 9" value="" />
        <customfield name="Custom 10" value="" />
      </customfields>
      <software_composition_analysis third_party_components="" violate_policy="" components_violated_policy="">
        <vulnerable_components>
          <component component_id="" file_name="" sha1="" vulnerabilities="" max_cvss_score="" version="" library="" vendor="" description="" component_affects_policy_compliance="" new="">
            <file_paths>
              <file_path value="" />
            </file_paths>
            <licenses>
              <license name="" spdx_id="" license_url="" risk_rating="" />
            </licenses>
            <vulnerabilities />
            <violated_policy_rules />
          </component>
        </vulnerable_components>
      </software_composition_analysis>
    </detailedreport>

What have you tried and what did not work?

Hi! I try with this logstash pipeline:

input{
file{
	    path => "/veracode/*.xml"
	    type => "xml"
        start_position => "beginning"
	    sincedb_path => "/dev/null"
        codec => multiline {
            pattern => "<"
            negate => true
            what => "previous"
        }
     } 
}
filter { 
xml {
    source => "message"
    store_xml => false
    target => "xml"
}
} 


output { 
stdout { 
    codec => rubydebug
}
elasticsearch {
        action => "index"  
        hosts => ["elasticsearch:9200"]
        index => "veracode" 
    } 
}

If you set store_xml to false and do not include the xpath option then the xml filter is a no-op.

Does your multiline codec work? Do you see each document in elasticsearch being a complete XML document?

Are you trying to consume the whole of each file as a single event? (You cannot parse multiple XML documents from a single event.)

Honestly, I never used the filter for xml, now it is clearer to me.

Yes, I consider that each XML document is a particular event and I would like to analyze each one as such, a separate document. How should I proceed to do this?

If your multiline codec is working then just change store_xml to be true.

However, it is very unlikely that the codec is working since it says to combine lines that do not contain <, and you do not have any lines like that.

Once again, are you trying to consume the whole of each file as a single event? If so, you can do this.

Hi Badger! i tryed do it. and ingest a completly XML in the field "message" like this:


i need ingest each value like a json file. Basically i need make customs dashboards for my team and need filter for each value in the XML. You think i should convert the documents in JSON for this? How i should proceed?
Thanks for all!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.