Logstash problems with xml filter

Hi all,
I want to collect only certain tags from my xml file (data with Parameter tag). Here is the example of xml file I want to parse:

//     <Instrument Name="GLPO" DisplayName="AAAA" HeartBeat="BBBB">
//      <Component Name="AutoCtf" DisplayName="AutoCtf" ServiceCategory="None">
//        <Parameter ID="495" EventID="497" Name="Defocus" />
 //       <Parameter ID="496" EventID="497" Name="Astigmatism" />
//        <Parameter ID="497" EventID="497" Name="AstigmatismOrientation" />
 //     </Component>
 //   </Instrument>

The problem is, beside right data, logstash index tags which does not have right strings. Here is what I get when run logstash:

//{
//      "@version" => "1",
//      "event_id" => [
//        [0] "498"
//    ],
//          "host" => "NLEIN-GZCVWZ1",
//          "type" => "healthmonitoring",
//     "health_id" => [
//        [0] "512"
//    ],
//    "@timestamp" => 2021-02-18T09:04:20.528Z,
//          "path" => //"C:/Users/aleksei.poliakov/Desktop/Internship/Logs/HealthMonitorCmd_20200817_153946.xml",
//       "message" => "        <Parameter ID=\"512\" EventID=\"498\" Name=\"Iteration\" //DisplayName=\"Iteration\" Type=\"Int\" StorageUnit=\"\" DisplayUnit=\"\" DisplayScale=\"\" //FormatString=\"\" ServiceCategory=\"None\" MaxLogInterval=\"00:00:00\" //AbsoluteMinimum=\"-1.7976931348623157E+308\" //AbsoluteMaximum=\"1.7976931348623157E+308\" />\r"
//}
//{
//          "@version" => "1",
//              "host" => "NLEIN-GZCVWZ1",
//              "type" => "healthmonitoring",
//        "@timestamp" => 2021-02-18T09:04:20.528Z,
//              "path" => //"C:/Users/aleksei.poliakov/Desktop/Internship/Logs/HealthMonitorCmd_20200817_153946.xml",
//           "message" => "      </Component>\r"
//}

My config file:

// input {
//    file {
//        path => ["C:/Users/aleksei.poliakov/Desktop/Internship/Logs/HealthMonitorCmd_20200817//_153946.xml"]
//        start_position => "beginning"
//        sincedb_path => "NUL"
//        type => "healthmonitoring"
//        exclude => "*.gz"
//    }
//}
// filter {
//    xml {
//        store_xml => false
//        source => "message"
//        target => "Parameter"
//        xpath => 
//        [
//            "//Parameter/@ID", "health_id",
//            "//Parameter/@EventID", "event_id"
//        ]
//    }
//}
// output {
//    if [type] == "healthmonitoring" {
//        elasticsearch {
//            hosts => ["localhost:9200"]
//            index => "health-monitoring-%{+DDMMYYYY}"
//        }
//    }
//    stdout { }
//}

Thank you in advance!

With that data and that filter I get

 "health_id" => [
    [0] "495",
    [1] "496",
    [2] "497"
],
  "event_id" => [
    [0] "497",
    [1] "497",
    [2] "497"
],

So I do not think you are doing what you think you are doing.

Hi Badger,
Thank you for reply.
How do your logs look like? If you check my post you can see that some logs do not have data I interested for. I even tried to use different computer, the result is the same.

P.S. I'm new at this so I don't really know if it is normal or not.

I want to get something like this:

 {
   "health_id" = "495",
   "event_id" = "497"
 }
 {
   "health_id" = "496",
   "event_id" = "497"
 }
 {
   "health_id" = "497",
   "event_id" = "497"
 }

I used

 <Instrument Name="GLPO" DisplayName="AAAA" HeartBeat="BBBB">
  <Component Name="AutoCtf" DisplayName="AutoCtf" ServiceCategory="None">
    <Parameter ID="495" EventID="497" Name="Defocus" />
    <Parameter ID="496" EventID="497" Name="Astigmatism" />
    <Parameter ID="497" EventID="497" Name="AstigmatismOrientation" />
  </Component>
</Instrument>

Sorry, I meant indexes. How do they look like?

I do not have indexes. I do not run elasticsearch.

Can you show me your config file in case you used different one? Because I get completely different output:

{
      "@version" => "1",
      "event_id" => [
       [0] "498"
    ],
      "type" => "healthmonitoring",
      "health_id" => [
        [0] "512"
    ],
    "@timestamp" => 2021-02-18T09:04:20.528Z,
}

{
      "@version" => "1",
      "type" => "healthmonitoring",
      "@timestamp" => 2021-02-18T09:04:20.528Z,
}

Well you would, since your [message] field is

"message" => " <Parameter ID="512" EventID="498" Name="Iteration" //DisplayName="Iteration" Type="Int" StorageUnit="" DisplayUnit="" DisplayScale="" //FormatString="" ServiceCategory="None" MaxLogInterval="00:00:00" //AbsoluteMinimum="-1.7976931348623157E+308" //AbsoluteMaximum="1.7976931348623157E+308" />\r"

That matches the health_id and event_id in your event.

I wonder if your problem is that you are consuming the XML line by line. You may need a multiline codec. Read this post and the posts linked to.

Thank you, I will check that post. Somehow, I thought that xml filter can do that by default. What the point in xml filter if you can do literally the same with grok filter?!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.