No idea how to setup pipeline for the xml parsing

Hello,

I know theres a lot of the same topics because i was trying to figure out from these how to parse logs from xml to the Elastic. What im trying to do is:

<record>
  <date>2024-11-08T18:32:56.379787054Z</date>
  <millis>1731090776379</millis>
  <nanos>787054</nanos>
  <sequence>102</sequence>
  <logger>org.forgerock.openidm.relationship.SignalPropagationCalculatorFactory</logger>
  <level>INFO</level>
  <class>org.forgerock.openidm.relationship.SignalPropagationCalculatorFactory</class>
  <method>getSignalPropagationCalculator</method>
  <thread>15</thread>
  <message>Smart-signaling disabled: false</message>
</record>

It should be in one log message and in separetly brackets in log. For example fieldxml.date, fieldxml.millis. I was trying to figure out with chatgpt but that even cant help me. I tried something like this:

filter {
  if "xml" in [log][file][path] {
    mutate {
      replace => { "message" => "<record>%{message}</record>" }
    }

    xml {
      source => "message"
      target => "parsed_xml"
      store_xml => true
      force_array => false
    }

    mutate {
      rename => { "[parsed_xml][date]" => "date" }
      rename => { "[parsed_xml][millis]" => "millis" }
      rename => { "[parsed_xml][nanos]" => "nanos" }
      rename => { "[parsed_xml][sequence]" => "sequence" }
      rename => { "[parsed_xml][logger]" => "logger" }
      rename => { "[parsed_xml][level]" => "level" }
      rename => { "[parsed_xml][class]" => "class" }
      rename => { "[parsed_xml][method]" => "method" }
      rename => { "[parsed_xml][thread]" => "thread" }
      rename => { "[parsed_xml][message]" => "log_message" }
    }

    mutate {
      remove_field => ["message", "parsed_xml"]
    }
  }
}

Thanks for helping! :slight_smile:

Use multiline to merge the event.

Okey, but where i should put that multiline? In filebeat.yml or in mine pipeline on elastic?

The code you initialy posted would be the one for a logstash pipeline, not filebeat or even elasticsearch.

Usually the data flow is the following :

  • filebeat -> elasticsearch
  • filebeat -> logstash -> elasticsearch
  • logstash -> elasticsearch

Filebeat (beats) is a "standalone" binary deployed on the host where you want to collect data while logstash can indead collect logs but requires JVM to run.

Once collected, the events are sent to elasticsearch and, if requested, an ingest pipeline will be executed on your event during ingestion resulting in a new document within your target index.

So in your case, since you mentioned filebeat, I assume you plan using filebeat to access the logs, then send it to logstash or directly elasticsearch.

Then to process XML log formated data with filebeat, you can indeed use multiline to extract as message your complete xml entry and with the decode_xml processor from filebeat or logstash xml filter, parse your "message" entry to an actual xml.

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  paths:
    - /opt/path/to/my/xml.log
  parsers:
    - multiline:
         type: pattern
         pattern: '^<record>'
         negate: true
         match: after

# xml conversion can be within the same filebeat.yaml or handed over to logstash.
# if in the same :
processors:
  - decode_xml:
      field: message
      target_field: "record"
      overwrite_keys: true

# any output (here logstash is not necessary)

Here is a similar thread about it

PS: This is my first post on the platform, so not sure if details are sufficient.

I don’t have access to the config I implemented. I used a he new agent, that’s equivalent to filebeat. The config above looks like what I remember.