Logstash + XML + Multiple split

Hello!

Wonder if someone could help me how to divide my XML into several events for their further parsing

I have the following XML file (yes, everything in one row):

<nodes><PROCESS_GROUP><PROCESS A="1" B="2" C="3" D="4" E="5">process_1</PROCESS><PROCESS A="6" B="7" C="8" D="9" E="10">process_2</PROCESS></PROCESS_GROUP><SESSION_GROUP><SESSION AA="11" BB="22" CC="33">session_11</SESSION><SESSION AA="44" BB="55" CC="66">session_22</SESSION></SESSION_GROUP></nodes>

and I would like using logstash to receive 4 events in the output like this:

[event_1]
"process" : "<PROCESS A="1" B="2" C="3" D="4" E="5">process_1</PROCESS>"

[event_2]
"process" : "<PROCESS A="6" B="7" C="8" D="9" E="10">process_2</PROCESS>"

[event_3]
"session" : "<SESSION AA="11" BB="22" CC="33">session_11</SESSION>"

[event_4]
"session" : "<SESSION AA="44" BB="55" CC="66">session_22</SESSION>"

I tried using the following conf file:

input {
  file {
    path => "/files/input"
    start_position => "beginning"
  }
}

filter {
  xml {
    source => "message"
    store_xml => false
    xpath => [
      "/nodes/PROCESS_GROUP/PROCESS", "process",
	  "/nodes/SESSION_GROUP/SESSION", "session"
    ]
  }

  split {
    field => "process"
  }
  split {
    field => "session"
  }
}


output {
  stdout {}
}

However, I received these events:

[event_1]
"process" : "<PROCESS A="1" B="2" C="3" D="4" E="5">process_1</PROCESS>"
"session" : "<SESSION AA="11" BB="22" CC="33">session_11</SESSION>"

[event_2]
"process" : "<PROCESS A="1" B="2" C="3" D="4" E="5">process_1</PROCESS>"
"session" : "<SESSION AA="44" BB="55" CC="66">session_22</SESSION>"

[event_3]
"process" : "<PROCESS A="6" B="7" C="8" D="9" E="10">process_2</PROCESS>"
"session" : "<SESSION AA="11" BB="22" CC="33">session_11</SESSION>"

[event_4]
"process" : "<PROCESS A="6" B="7" C="8" D="9" E="10">process_2</PROCESS>"
"session" : "<SESSION AA="44" BB="55" CC="66">session_22</SESSION>"

I understand that according to logstash it is expected, however, it would be awesome if someone could guide me how to achieve my initial goal

Thanks!

Okay, seems I found the solution

I extended my initial XML file to this one:

<nodes><PROCESS_GROUP><PROCESS A="1" B="2" C="3" D="4" E="5">process_1</PROCESS><PROCESS A="6" B="7" C="8" D="9" E="10">process_2</PROCESS></PROCESS_GROUP><SESSION_GROUP><SESSION AA="11" BB="22" CC="33">session_11</SESSION><SESSION AA="44" BB="55" CC="66">session_22</SESSION></SESSION_GROUP><PARM_GROUP><PARM>A=1;</PARM><PARM>B=2;</PARM><PARM>C=3;</PARM><PARM>D=4;</PARM></PARM_GROUP><EXTRA>AAA</EXTRA></nodes>

and used the following conf file to parse it:

input {
  file {
    path => "/files/input"
    start_position => "beginning"
  }
}

filter {

  # This action duplicates the initial event and leads to having 3 events:
  #
  #   1. initial event
  #   2. event copy with "type"="process_group_tag"
  #   3. event copy with "type"="session_group_tag"

  clone {
    clones => ["process_group_tag", "session_group_tag"]
  }

  # Actions with event with "type"="process_group_tag"
  if [type] == "process_group_tag" {
    xml {
      source => "message"
      store_xml => false
      xpath => [
        "/nodes/PROCESS_GROUP/PROCESS", "process"
      ]
    }
    split {
      field => "process"
    }
    xml {
      source => "process"
      store_xml => false
      force_array => false
      xpath => [
        "/PROCESS/text()","process_name",
        "/PROCESS/@A","A",
        "/PROCESS/@B", "B",
        "/PROCESS/@C", "C",
        "/PROCESS/@D", "D",
        "/PROCESS/@E", "E"
      ]
    }
  }

  # Actions with event with "type"="session_group_tag"
  if [type] == "session_group_tag" {
    xml {
      source => "message"
      store_xml => false
      xpath => [
      "/nodes/SESSION_GROUP/SESSION", "session"
      ]
    }
    split {
      field => "session"
    }
    xml {
      source => "session"
      store_xml => false
      force_array => false
      xpath => [
        "/SESSION/text()","session_name",
        "/SESSION/@AA","AA",
        "/SESSION/@BB", "BB",
        "/SESSION/@CC", "CC"
      ]
    }
  }

  # Actions with initial event
  if ![type] {
    xml {
      source => "message"
      store_xml => false
      force_array => false
      xpath => [
        "/nodes/PARM_GROUP/PARM/text()", "params",
        "/nodes/EXTRA/text()", "extra"
      ]
    }
  }

  mutate {
    remove_field => ["host", "@version", "@timestamp", "path", "process", "session", "type", "message"]
  }
}

output {
  stdout {}
}

Results are the next ones:

{
    "extra" => "AAA",
    "params" => [
        [0] "A=1;",
        [1] "B=2;",
        [2] "C=3;",
        [3] "D=4;"
    ]
}
{
    "session_name" => "session_11",
              "AA" => "11",
              "BB" => "22",
              "CC" => "33"
}
{
    "session_name" => "session_22",
              "AA" => "44",
              "BB" => "55",
              "CC" => "66"
}
{
    "process_name" => "process_1",
               "A" => "1",
               "B" => "2",
               "C" => "3",
               "D" => "4",
               "E" => "5"
}
{
    "process_name" => "process_2",
               "A" => "6"
               "B" => "7",
               "C" => "8",
               "D" => "9",
               "E" => "10",
}

If there is a more elegant solution, please let me know

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.