Parse xml entry with logstash and create a new field with the data

I can see no reason why what you tried would not work.

Hi @Badger ,

is there a way logstash read updated contents of a file before sending event to elastic.

In my build xml, there is a tag <duration>. it is assigned a value 0 in the start of the process(Jenkins job). and logstash reads the file and create event based on it. but once that Jenkins job is completed. the duration value get updated to the actual time taken by job to complete. but logstash doesn't update its value in the event and sends 0 as value in the event to elastic.

is there a way logstash can also read the updated value of tag which it already read?

Below is my logstash conf

 input {
      file {
        path => "/var/jenkins_home/jobs/CTSD/jobs/apis/jobs/eatviewer/branches/*/builds/*/build.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        type => "xml"
        codec => multiline {
           pattern => '^[A-Z]{1}[a-z]{2} {1,2}[0-9]{1,2},[0-9]{4} {1,2}[0-9]{1,2}:[0-9]{2}:[0-9]{2}'
           negate => true
           what => previous
           max_lines => 10000000000
           auto_flush_interval => 600
        }
      }
    }
    filter {
      xml {
        source => "message"
        store_xml => false
        xpath => [
            "/flow-build/startTime/text()", "startTime",
            "/flow-build/duration/text()", "duration",
            "/flow-build/execution/result/text()", "result"
        ]
      }
    }
    output {
      elasticsearch { hosts => [ "https://elastic:443/elasticsearch" ] index => "elktest-%{+YYYY.MM.dd}" }
      stdout { codec => rubydebug }
    }

If it rewrites the same file, then no, logstash cannot do that.

Hi @Badger ,

thanks, but is there a way i read the xml file for parsing once <duration> value is non-zero in xml.
I am currently stuck as without this my data being fetched from xml is not correct. only when duration has non-zero value my xml file will be valid to get parsed by logstash

Hi @Badger , can you please help suggest on this ?

You could use a conditional and drop {} the event if the value of the duration field is zero, or possibly if [message] contains <duration>0</duration>.

Thanks @Badger , but will logstash read again this XML and generate an event once duration value is non zero ?

As I said before, if the file gets rewritten then the file input will not reread it.

is there a way, we can define in logstash.conf that once my file has been updated then logstash picks up the build.xml?

We tried with stat_interval which would actually delay the reading of the build.xml by logstash but not sure if that would be an accurate solution.

The file input is designed to tail log files. If a new file is created (with a new inode) then logstash will read it. If the same file is rewritten from the beginning then a file input will not re-read it.

Hi @Badger ,

I am trying to convert startTime field epoc value into UNIX time. but it is getting parse failure
I want to convert the value of <startTime> and <duration> to readable format. Can you help what is the issue here ?

Below is my conf and as well result after logstash filteration:

 logstash.conf: |
    input {
      file {
        path => "/var/jenkins_home/jobs/CTSD/jobs/sdapplications/branches/master/builds/7/build.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        type => "xml"
        codec => multiline {
           pattern => '^[A-Z]{1}[a-z]{2} {1,2}[0-9]{1,2},[0-9]{4} {1,2}[0-9]{1,2}:[0-9]{2}:[0-9]{2}'
           negate => true
           what => previous
           max_lines => 10000000000
           auto_flush_interval => 60
        }
      }
    }
    filter {
      xml {
        source => "message"
        store_xml => false
        xpath => [
            "/flow-build/startTime/text()", "startTime",
            "/flow-build/duration/text()", "duration",
            "/flow-build/execution/result/text()", "result"
        ]
        remove_field => [ "message" ]
      }
    }
    filter {
      date {
      timezone => "UTC"
      match => ["startTime", "UNIX_MS"]
      target => "startTime"
      }
    }
    output {
      elasticsearch { hosts => [ "https://elastic:443/elasticsearch" ] index => "elktest-%{+YYYY.MM.dd}" }```

**Output :**

{
      "@version" => "1",
          "type" => "xml",
     "startTime" => [
        [0] "1651059323781"
    ],
          "path" => "/var/jenkins_home/jobs/CTSD/jobs/sdapplications/branches/master/builds/7/build.xml",
        "result" => [
        [0] "SUCCESS"
    ],
          "tags" => [
        [0] "multiline",
        [1] "_dateparsefailure"
    ],
          "host" => "elktest-0",
      "duration" => [
        [0] "423817"
    ],
    "@timestamp" => 2022-04-27T17:36:04.619Z
}

startTime is an array, so you need match => ["[startTime][0]", "UNIX_MS"]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.