Logstash-XML parser

Hi,
I am new to ELK.
I am trying to parse jenkins build.xml and collecting output using elastic search.
I am using codec => multiline for the input to get it as a single event but i could not able to get tag in the message.

some one please help me on this. Thank you.

my input build.xml

<?xml version='1.0' encoding='UTF-8'?> rattisyam 1 8 refs/remotes/origin/master 0eae8a5c898e67eab676b5e95983ac11a20e7534 refs/remotes/origin/master 6 https://github.com/rattisyam/maven-project.git refs/remotes/origin/master /var/lib/jenkins/workspace/PackageJob 35 1517408055456 1517408055457 SUCCESS 10748 UTF-8 false /var/lib/jenkins/workspace/PackageJob 2.73.3 false

logstash.conf file:
input {
file {
path => "/var/lib/jenkins/jobs/PackageJob/builds/8/build.xml"
sincedb_path => "/dev/null"
start_position => "beginning"
type => "buildxml"

    codec => multiline {
          pattern => "<?xml version='1.0' encoding='UTF-8'?>"
          negate => true
          what => "next"
          max_lines => 1000
        }
 }

}

filter {

}

output {
elasticsearch {
hosts => ["172.31.33.209:9200"]
index => "buildxml1"
document_type => "demobuildxml"
}

stdout { codec => rubydebug }

}

my output which i am getting now :

<?xml version='1.0' encoding='UTF-8'?> rattisyam 1 17 refs/remotes/origin/master 0eae8a5c898e67eab676b5e95983ac11a20e7534 refs/remotes/origin/master 8 https://github.com/rattisyam/maven-project.git refs/remotes/origin/master /var/lib/jenkins/workspace/PackageJob 39 1517409456533 1517409456534 SUCCESS 10641 UTF-8 false /var/lib/jenkins/workspace/PackageJob 2.73.3 false

--here end tag is missing so that i canot apply xml parser on input message.

and my goal is to get username, buil duration,build number, build status and project name from the build.xml file

please help me how to get these details in kibana.

Regards,
Syam

output:
output which i got now:

<?xml version='1.0' encoding='UTF-8'?> rattisyam 1 17 refs/remotes/origin/master 0eae8a5c898e67eab676b5e95983ac11a20e7534 refs/remotes/origin/master 8 https://github.com/rattisyam/maven-project.git refs/remotes/origin/master /var/lib/jenkins/workspace/PackageJob 39 1517409456533 1517409456534 SUCCESS 10641 UTF-8 false /var/lib/jenkins/workspace/PackageJob 2.73.3 false

please help me on this config.Thank you

When you post preformatted code, please use the preformatted text button to ensure the forums don't mess with it, there's an icon that looks like </>, just highlight your text and click that button, helps in understanding what you paste in.

As far as the issue, I'm not quite sure how it processes the what action next...does it stick it at the beginning of the line or at the end? Regardless, have you tried setting your pattern to <build> and then what to previous? Seems like it would give you something like below.

<?xml version='1.0' encoding='UTF-8'?>
<build><actions><hudson.model.CauseAction><causeBag class="linked-hash-map">...</build>

Afterwards, you could use an if expression with the drop filter to get rid of the xml versioning/encoding line from the event. For the problem with the end tag missing, add the multiline codec option auto_flush_interval

Thank you very much for u r answer.
I changed my logstash.conf file to below. even now i am getting the output as same as above.
even i tried with pattern and still i am not getting the end tag
input {
file {
path => "/var/lib/jenkins/jobs/PackageJob/builds/7/build.xml"
sincedb_path => "/dev/null"
start_position => "beginning"
type => "buildxml"

    codec => multiline {
          pattern => "^<?xml version='1.0' encoding='UTF-8'?>"
          negate => true
          what => "previous"
          max_lines => 1000
          auto_flush_interval => 3
        }
 }

}

filter {

}

output {
elasticsearch {
hosts => ["172.31.33.209:9200"]
index => "abcdef"
document_type => "demobuildxml"
}

stdout { codec => rubydebug }

}


Output :
"message" => "<?xml version='1.0' encoding='UTF-8'?>\n\n \n <hudson.model.CauseAction>\n <causeBag class="linked-hash-map">\n \n <hudson.model.Cause_-UserIdCause>\n rattisyam\n </hudson.model.Cause_-UserIdCause>\n 1\n \n \n </hudson.model.CauseAction>\n <hudson.plugins.jobConfigHistory.JobConfigBadgeAction plugin="jobConfigHistory@2.18">\n \n 2018-01-31_14-29-10\n 2018-01-31_14-02-54\n \n </hudson.plugins.jobConfigHistory.JobConfigBadgeAction>\n <jenkins.metrics.impl.TimeInQueueAction plugin="metrics@3.1.2.10">\n 7\n </jenkins.metrics.impl.TimeInQueueAction>\n <hudson.plugins.git.util.BuildData plugin="git@3.6.4">\n \n \n refs/remotes/origin/master\n <hudson.plugins.git.util.Build>\n <marked plugin="git-client@2.6.0">\n 0eae8a5c898e67eab676b5e95983ac11a20e7534\n <branches class="list">\n <hudson.plugins.git.Branch>\n <sha1 reference="../../../sha1"/>\n refs/remotes/origin/master\n </hudson.plugins.git.Branch>\n \n \n <revision reference="../marked"/>\n 7\n </hudson.plugins.git.util.Build>\n \n \n <lastBuild reference="../buildsByBranchName/entry/hudson.plugins.git.util.Build"/>\n \n https://github.com/rattisyam/maven-project.git\n \n </hudson.plugins.git.util.BuildData>\n <hudson.plugins.git.GitTagAction plugin="git@3.6.4">\n <tags class="hudson.util.CopyOnWriteMap$Tree">\n \n refs/remotes/origin/master\n \n \n \n /var/lib/jenkins/workspace/PackageJob\n </hudson.plugins.git.GitTagAction>\n <hudson.scm.SCMRevisionState_-None/>\n \n 37\n 1517408954042\n 1517408954042\n SUCCESS\n 10718\n UTF-8\n false\n \n /var/lib/jenkins/workspace/PackageJob\n 2.73.3\n <scm class="hudson.plugins.git.GitChangeLogParser" plugin="git@3.6.4">\n false\n \n <culprits class="com.google.common.collect.EmptyImmutableSortedSet"/>",
"@version" => "1",
"host" => "ip-172-31-43-243",
"type" => "buildxml",
"tags" => [
[0] "multiline"
],
"path" => "/var/lib/jenkins/jobs/PackageJob/builds/7/build.xml",
"@timestamp" => 2018-03-01T09:03:07.930Z
}

Note: Still I am unable to get tag in the message. Please help me how to get that last tag. and even i changed what==> previous and auto_flush_interval =>3 seconds.

Honestly I'm not sure..what version of ElasticStack products are you using?

6.1-elk version.thnx.

hi team, can nay one help me on this. Thank you.

Can any one help on this. Plz.Thnx.

blah...not sure why I didn't see this earlier. You aren't using the XML filter to parse the data so it's ingesting the file, the multiline codec is sticking it all together onto a single line, and then it's being output to Elasticsearch. In the filter section you need to do a couple things, this example is not all inclusive or functional, you'll need to read the Logstash XML plugin filter documentation and learn some xpath basics to tailor it to your needs:

filter {
  xml {
    source => "message"
    xpath => [
      "xpath", "field1",
      "xpath", "field2"
    ]
  }
}

Thanks for u r reply.

The multiline codec is not giving complete XML (it is missing tag. So that I cannot parse it to XML and XPATH.

What output are you getting now?

I am getting same output as previous.
stdout { codec => rubydebug }

}

Output :
"message" => "<?xml version='1.0' encoding='UTF-8'?>\n\n \n <hudson.model.CauseAction>\n \n \n <hudson.model.Cause_-UserIdCause>\n rattisyam\n </hudson.model.Cause_-UserIdCause>\n 1\n \n \n </hudson.model.CauseAction>\n <hudson.plugins.jobConfigHistory.JobConfigBadgeAction plugin="jobConfigHistory@2.18">\n \n 2018-01-31_14-29-10\n 2018-01-31_14-02-54\n \n </hudson.plugins.jobConfigHistory.JobConfigBadgeAction>\n <jenkins.metrics.impl.TimeInQueueAction plugin="metrics@3.1.2.10">\n 7\n </jenkins.metrics.impl.TimeInQueueAction>\n <hudson.plugins.git.util.BuildData plugin="git@3.6.4">\n \n \n refs/remotes/origin/master\n <hudson.plugins.git.util.Build>\n \n 0eae8a5c898e67eab676b5e95983ac11a20e7534\n \n <hudson.plugins.git.Branch>\n \n refs/remotes/origin/master\n </hudson.plugins.git.Branch>\n \n \n \n 7\n </hudson.plugins.git.util.Build>\n \n \n \n \n https://github.com/rattisyam/maven-project.git\n \n </hudson.plugins.git.util.BuildData>\n <hudson.plugins.git.GitTagAction plugin="git@3.6.4">\n \n \n refs/remotes/origin/master\n \n \n \n /var/lib/jenkins/workspace/PackageJob\n </hudson.plugins.git.GitTagAction>\n <hudson.scm.SCMRevisionState_-None/>\n \n 37\n 1517408954042\n 1517408954042\n SUCCESS\n 10718\n UTF-8\n false\n \n /var/lib/jenkins/workspace/PackageJob\n 2.73.3\n \n false\n \n ",
"@version" => "1",
"host" => "ip-172-31-43-243",
"type" => "buildxml",
"tags" => [
[0] "multiline"
],
"path" => "/var/lib/jenkins/jobs/PackageJob/builds/7/build.xml",
"@timestamp" => 2018-03-01T09:03:07.930Z
}

Note: Still I am unable to get tag in the message. Please help me how to get that last tag. and even i changed what==> previous and auto_flush_interval =>3 seconds.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.