Parse xml entry with logstash and create a new field with the data

I need to parse below xml file with logstash and filter out status and create a new field with its value.

below is the sample xml file:

<?xml version="1.1" encoding="UTF-8"?>

<flow="abc">
<tag>SUCCESS</tag>
</flow>

I have created below logstash but it is giving error and not giving required result

apiVersion: v1
data:
logstash.conf: |
input {
beats {
port => 5044
}
}
filter {
xml {
source => "message"
target => "xml_content"
}
split {
field => "xml_content[flow]"
}
split {
field => "xml_content[flow][result]"
}
mutate {
add_field => { "status" => "%{xml_content[flow][result]}" }
}
}
output { Elasticsearch { hosts => "Elasticsearch" } }

You need to format your post using markdown. If you Google "markdown tutorial" you will find multiple sites that provide one. Use the preview pane on the right of the edit pane to make sure the code is formatted correctly.

What error do you get? What does the [message] field of your event look like? (Expand an event in the Discover pane of Kibana and copy and paste from the JSON tab.)

Hello Badger ,

message field contains the complete xml

<?xml version="1.1" encoding="UTF-8"?>

<flow-build plugin="workflow-job@1145.v7f2433caa07f">
<actions>
<hudson.model.CauseAction>
<causeBag class="linked-hash-map">
<entry>
<hudson.model.Cause_-UserIdCause>
<userId>503260426</userId>
</hudson.model.Cause_-UserIdCause>
<int>1</int>
</entry>
<queueId>73</queueId>
<timestamp>1649770295128</timestamp>
<startTime>1649770295158</startTime>
<result>SUCCESS</result>
</flow-build>

I want to fetch result data whether SUCCESS or FAILURE and create a new filed for it

@Badger can you please suggest

If the [message] field contains that text then

xml {
    source => "message"
    store_xml => false
    xpath => { "//result/text()" => "result" }
}

will produce

    "result" => [
    [0] "SUCCESS"
],

using xpath always results in the extracted objects being arrays. You can adjust that using

mutate { replace => { "result" => "%{[result][0]}" } }

to get

    "result" => "SUCCESS",

If you edit the XML to be valid (by closing the actions, hudson.model.CauseAction, and causeBag elements) then another way to do it would be

    xml {
        source => "message"
        target => "theXML"
        force_array => false
    }

which gets you

                      "result" => "SUCCESS",
                     "queueId" => "73",
                       "entry" => {
        "hudson.model.Cause_-UserIdCause" => {
            "userId" => "503260426"
        },

etc,

Hello @Badger ,

I tried below as you suggested. Although it is adding new field but value in field is not getting replaced

filter {
xml {
source => "message"
store_xml => false
xpath => { "//flow-build/result/text()" => "result" }
}
mutate {
replace => { "result" => "%{result}" }
}
}

Field in kibana:

image

I tied both with array and without array

image

I tried below logstash.conf as well

logstash.conf: |
input {
beats {
port => 5044
}
}
filter {
prune {
blacklist_names => [ " \e[8mha.\e[0m " ]
}
}
filter {
mutate {
gsub => ["message", "\e[8mha.
\e[0m", ""]
}
}
filter {
xml {
source => "message"
store_xml => false
xpath => { "/flow-build[@plugin]/result/text()" => "result" }
}
mutate {
add_field => { "result" => "%{result}" }
}
}
output { Elasticsearch { hosts => "Elasticsearch:9200" } }

to fetch the string in result tag in xml under structure /flow-build/result to create a new field and display in kibana

As I said, if the [message] field contains the text you said it contains then the filter configuration I posted will work. If it contains something else then it may not. Given that what you posted is not valid XML I suspect the [message] field may be slightly different.

Hello @Badger , there is an XML in the message field in kibana. Is there any way to confirm the same ?
I am new to logstash and kibana. Can you please let me know if any way to check what is in message field

Earlier today (my today, possibly your yesterday) you started another thread, which I spent some time working on, and then you deleted it before I could post my answer.

An XML filter expects a single XML element to surround everything in the source field. If there are two or more top-level XML elements then it will complain about trying to add a second item at the root.

You may be able to fix the message using something like

mutate { gsub => [ "sourceField", "</endOfUsefulPart>.*", "</endOfUsefulPart>" ] }

And I realize that there was a ton of information in that post that you probably did not want to share. But it is hard for us to help you without a reproducible failure. If you do provide one then I, and several other folks, will be happy to test it and help.

Spending time to narrow down a reproducible example adds a skill that will let you get more answers from more people.

As an example of reproduction... if you have a grok pattern that include IPV4, do not obfuscate your IP address as "a.b.c.d" (which is not a valid IP address), just replace it with "1.2.3.4" (which is valid). It is a trivial change for you and makes testing things easier for every person who reviews questions here, and much more likely that one of us will take the time to provide guidance.

If your message field contains that complete XML flow-build element then

    xml {
        source => "message"
        store_xml => false
        xpath => { "/flow-build/result/text()" => "result" }
        remove_field => [ "message" ]
    }

will produce

    "result" => [
    [0] "SUCCESS"
]

Hello @Badger ,

I tried above but it did not create new field in document in kibana with name result when xml was parsed by logstash and sent to Elasticsearch.
do i need to add something else to put the value fetched by xpath and create a new field out of it ?

I do not know why it would not do so.

Hello @Badger ,

It seems like my filebeat which is sending source xml file(build.xml) in message field to logstash is getting corrupted while being transfered. I tried to test both xml from source location and message field. source location xml file is in correct format. but when it is sent in message field it's structure is getting changed.
Do you know any way to resolve this ?

That sounds like a filebeat question, and one I cannot answer.

thanks @Badger, I found some part of xml is getting repeated which in message field like below which is causing the issue while parsing the xml

below part is coming after end tag </flow-build>

Is there a way to mark the beginning and end tag of xml in logstash to be parsed ?

 <name>master</name>
          </hudson.plugins.git.BranchSpec>
        </branches>
        <doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
        <submoduleCfg class="empty-list"/>
        <extensions>
          <jenkins.plugins.git.GitSCMSourceDefaults>
            <includeTags>false</includeTags>
          </jenkins.plugins.git.GitSCMSourceDefaults>
          <hudson.plugins.git.extensions.impl.BuildChooserSetting>
            <buildChooser class="jenkins.plugins.git.AbstractGitSCMSource$SpecificRevisionBuildChooser">
              <revision reference="../../../../../../../actions/hudson.plugins.git.util.BuildData[3]/buildsByBranchName/entry/hudson.plugins.git.util.Build/marked"/>
            </buildChooser>
          </hudson.plugins.git.extensions.impl.BuildChooserSetting>
        </extensions>
      </scm>
      <node>build-agent-n8m6b</node>
      <workspace>/home/jenkins/agent/workspace/CTSD_apis_eatviewer_master</workspace>
      <pollingBaseline class="hudson.scm.SCMRevisionState$None" reference="../../../actions/org.jenkinsci.plugins.workflow.steps.scm.MultiSCMRevisionState/revisionStates/entry/hudson.scm.SCMRevisionState_-None"/>
    </org.jenkinsci.plugins.workflow.job.WorkflowRun_-SCMCheckout>
  </checkouts>

If you want to discard that you could try

mutate { gsub => [ "message", "(</flow-build>).*", "\1" ] }

which will keep the </flow-build> but delete everything after it.

thanks @Badger . I changed and added below then it worked
I was able to get <result> as a field

 filter {
      mutate {
        gsub => [ "message", "(</flow-build>).*</checkouts>", "\1" ]
      }
    }

I also want <startTime> to be extracted from xml in "message" field which has epoc time as value and create a new field for it as well after converting epoc to UNIX time

I tried below code to extract both fields, but it just created <startTime> field and not result field

data:
  logstash.conf: |
    input {
      beats {
        port => 5044
      }
    }
    filter {
      mutate {
        gsub => [ "message", "(</flow-build>).*</checkouts>", "\1" ]
      }
    }
    filter {
      xml {
        source => "message"
        store_xml => false
        xpath => [
           "/flow-build/result/text()", "result",
           "/flow-build/startTime/text()", "startTime"
        ]
        remove_field => [ "message" ]
      }
    }

Any way if i can extract multiple fields from xml and the remove message field ?

Hello @Badger , can you please suggest me here if possible ?
Thanks for your support