Logstash split xml fields

With reference to the xml file from Logstash, split event from an xml file in multiples documents keeping information from root tags given at the end I am trying to split the xml tags using following filter. My intention is to generate multiple events for each tag .

filter {
        xml {
                source => "message"
                target => "parsed"
                store_xml => "false"
        }
split {
                field => "[R]"
                // field => "[parsed][R]"
        }
split {
                field => "[R][TS]"
                // field => "[parsed][R][TS]"
        }
}

But i am getting either _split_type_failure or Only String and Array types are splittable .........nilclass
I am using logstash 5.4.1

<R browserExecutionType="Parallel" endTime="02/09/2016 04:21:27 PM" executionEnvironment="Local" name="Test Execution R - bookHotelNew" startTime="02/09/2016 04:18:46 PM">
<TS TSExecutionType="Sequence" browser="CHROME" desc="bookHotelNew_CHROME" endTime="02/09/2016 04:19:54 PM" iterationType="TestScenario" name="bookHotelNew_CHROME" startTime="02/09/2016 04:18:46 PM" status="0">
	<TC desc="TS_bookHotelNew" endTime="02/09/2016 04:19:46 PM" name="TS_bookHotelNew" startTime="02/09/2016 04:18:50 PM" status="0">
		<TCIter endTime="02/09/2016 04:19:46 PM" startTime="02/09/2016 04:18:51 PM" status="0" value="0">
			<PB bddKeyWord="" desc="BPC_s02_Login" endTime="02/09/2016 04:19:03 PM" name="BPC_s02_Login" startTime="02/09/2016 04:18:51 PM" status="1" totalEnabledSteps="4" totalSteps="4">
				<BPIter endTime="02/09/2016 04:19:03 PM" failed="0" notExecuted="0" passed="0" skipped="0" startTime="02/09/2016 04:18:51 PM" status="1" totalEnabledSteps="4" totalSteps="4" value="0">
					<Res ErrorImagePath="" actSeq="1" action="driver_get" conditionString="NA" conditionType="NA" endTime="02/09/2016 04:18:54 PM" fromStep="0" isConditionApplied="false" iterations="0" message="Perform 'driver_get' operation for the value 'http://adactin.com//HotelApp/index.php,'" object="" outputParamName="" outputParamValue="" skipReason="" startTime="02/09/2016 04:18:51 PM" status="1" step="0" toStep="0" />
					<Res ErrorImagePath="" actSeq="2" action="element_setElementText" conditionString="NA" conditionType="NA" data="andisrinu," endTime="02/09/2016 04:18:57 PM" fromStep="0" isConditionApplied="false" iterations="0" message="Perform 'element_setElementText' operation on 'By.id: username' for the value 'andisrinu,'" object="By.id: username" outputParamName="" outputParamValue="" skipReason="" startTime="02/09/2016 04:18:54 PM" status="1" step="0" toStep="0" />

				</BPIter>
			</PB>
		</TCIter>
	</TC>
	<TC desc="TS_testSc" endTime="02/09/2016 04:19:54 PM" name="TS_testSc" startTime="02/09/2016 04:19:46 PM" status="1">
		<TCIter endTime="02/09/2016 04:19:54 PM" startTime="02/09/2016 04:19:46 PM" status="1" value="0">
			<PB bddKeyWord="" desc="BPC_s01_Login_Logout" endTime="02/09/2016 04:19:54 PM" name="BPC_s01_Login_Logout" startTime="02/09/2016 04:19:46 PM" status="1" totalEnabledSteps="6" totalSteps="6">
				<BPIter endTime="02/09/2016 04:19:54 PM" failed="0" notExecuted="0" passed="0" skipped="0" startTime="02/09/2016 04:19:46 PM" status="1" totalEnabledSteps="6" totalSteps="6" value="0">
					<Res ErrorImagePath="" actSeq="1" action="driver_get" conditionString="NA" conditionType="NA" endTime="02/09/2016 04:19:46 PM" fromStep="0" isConditionApplied="false" iterations="0" message="Perform 'driver_get' operation for the value 'http://adactin.com//HotelApp/index.php,'" object="" outputParamName="" outputParamValue="" skipReason="" startTime="02/09/2016 04:19:46 PM" status="1" step="0" toStep="0" />
				</BPIter>
			</PB>
		</TCIter>
	</TC>
</TS>
<TS TSExecutionType="Sequence" browser="IE" desc="bookHotelNew_IE" endTime="02/09/2016 04:21:26 PM" iterationType="TestScenario" name="bookHotelNew_IE" startTime="02/09/2016 04:19:58 PM" status="0">
	<TC desc="TS_bookHotelNew" endTime="02/09/2016 04:21:06 PM" name="TS_bookHotelNew" startTime="02/09/2016 04:20:00 PM" status="0">
		<TCIter endTime="02/09/2016 04:21:06 PM" startTime="02/09/2016 04:20:00 PM" status="0" value="0">
			<PB bddKeyWord="" desc="BPC_s02_Login" endTime="02/09/2016 04:20:08 PM" name="BPC_s02_Login" startTime="02/09/2016 04:20:00 PM" status="1" totalEnabledSteps="4" totalSteps="4">
				<BPIter endTime="02/09/2016 04:20:08 PM" failed="0" notExecuted="0" passed="0" skipped="0" startTime="02/09/2016 04:20:00 PM" status="1" totalEnabledSteps="4" totalSteps="4" value="0">
					<Res ErrorImagePath="" actSeq="1" action="driver_get" conditionString="NA" conditionType="NA" endTime="02/09/2016 04:20:00 PM" fromStep="0" isConditionApplied="false" iterations="0" message="Perform 'driver_get' operation for the value 'http://adactin.com//HotelApp/index.php,'" object="" outputParamName="" outputParamValue="" skipReason="" startTime="02/09/2016 04:20:00 PM" status="1" step="0" toStep="0" />
				</BPIter>
			</PB>
		</TCIter>
	</TC>
</TS>

NB: Is it possible that each tag have events and that includes root tag ?

Do this step by step. Comment out the split filters. What does the resulting events look like? Use a stdout { codec => rubydebug } output. If they look as you'd expect, add the first split filter. What happens?

Hi
magnusbaeck
I did this step by step. Without split filter, rubydebug produces correct output in the terminal. The result is a single event that i can sent to kibana. For multiple event for the tag <item name="Software Application"> i tried with split filter for one field but rubydebug runs into infinite loop. It goes on outputting the parsed fields in the terminal until i press CTRL+C. If i start the logstash service, it goes on producing following logs .

..........................
_globbed_files: /home/xmldata/in.xml: glob is: ["/home/xmldata/in.xml"]
Pushing flush onto pipeline
Pushing flush onto pipeline
...........................
...........................

New xml file is here.

input.conf

input {
 file {
         path => "/home/xmldata/in.xml"
         start_position => beginning
         sincedb_path => "/dev/null"
         ignore_older => 0
         codec => multiline {
                              pattern => "^<NetReport>"
                              negate => "true"
                              what => "previous"
                              multiline_tag => "multi_tagged"
                              max_lines => 5000
        }
      }
}

filter.conf

filter {
        xml {
                source => "message"
                target => "parsed"
                store_xml => "false"
                xpath => [
                           "NetReport/@datetime", "date_time",
                           "NetReport/Net/@managedClientID", "cid" ,
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Name']/text()", "Pname",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Caption']/text()", "Caption",
                          # "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Description']/text()", "Description",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Identifying Number']/text()", "Identity_No",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Install Date']/text()", "Install_Date",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Install Date 2']/text()", "Install_Date2",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Install State']/text()", "Install_State",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Vendor']/text()","vendor",
                           "/NetReport/Net/category[@name='Software']/item[@name='Software Application']//property[@name='Version']/text()","version"
                ]
        }        
        //split {
               // field => "Pname"
       // }
              mutate {
                    remove_field => [ "message", "parsed" ]
            }
 }

output.conf

output{
       stdout { codec => "rubydebug" }
}

Output without split

root@ubuntu:/etc/logstash/conf.d# /usr/share/logstash/bin/logstash -f 10-xml-input.conf --debug --verbose
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
Could not find log4j2 configuration at path //usr/share/logstash/config/log4j2.properties. Using default config which logs to console
09:48:49.341 [[main]-pipeline-manager] INFO logstash.pipeline - Starting pipeline {"id"=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>250}
09:48:49.635 [[main]-pipeline-manager] INFO logstash.pipeline - Pipeline main started
09:48:49.731 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}
^C09:48:51.758 [SIGINT handler] WARN logstash.runner - SIGINT received. Shutting down the agent.
09:48:51.766 [LogStash::Runner] WARN logstash.agent - stopping pipeline {:id=>"main"}
{
"Pname" => [
[ 0] "Windows Genuine Advantage Validation Tool (KB892130)",
[ 1] "Intel(R) PRO Network Connections 11.2.0.69",
[ 2] "AddressBook",
..................
],
"Identity_No" => [
[ 0] "null",
[ 1] "{2222B364-0854-4265-B32E-A142DB9DC7BB}",
[ 2] "null",
..................
],
"Install_State" => [
[ 0] "5",
[ 1] "5",
[ 2] "5",
..................
],
"Install_Date2" => [
[ 0] "null",
[ 1] "01-19-2010",
[ 2] "null",
..................
],
"version" => [
[ 0] "1.7.0069.2",
[ 1] "null",
[ 2] "null",
...................
],
"Caption" => [
[ 0] "Windows Genuine Advantage Validation Tool (KB892130)",
[ 1] "Intel(R) PRO Network Connections 11.2.0.69",
[ 2] "AddressBook",
..................
],
"tags" => [
[0] "multi_tagged"
],
"path" => "/home/xmldata/in.xml",
"@timestamp" => 2017-06-13T09:48:52.206Z,
"date_time" => "06-08-2017 19:15:02",
"Install_Date" => [
[ 0] "null",
[ 1] "01-19-2010",
[ 2] "null",
..................
],
"vendor" => [
[ 0] "Microsoft Corporation",
[ 1] "Intel",
[ 2] "null",
..................
],
"@version" => "1",
"host" => "ubuntu",
"cid" => [
[0] "DEVICE12345"
]
}

Regards

Does the ruby code in the filter section is the best way to combine values of Pname[0], Identity_No[0], Install_State[0],................ Pname[1], Identity_No[1], Install_State[1],................ and so on to generate events ? If so then i think to access 8 hashes and generate events will take a considerable amount of time. Efficient solution ?

Does the ruby code in the filter section is the best way to combine values of Pname[0], Identity_No[0], Install_State[0],................ Pname[1], Identity_No[1], Install_State[1],................ and so on to generate events ?

Sure, that works. I don't think there's any stock filter that allows you to do the same things.

If so then i think to access 8 hashes and generate events will take a considerable amount of time. Efficient solution ?

I wouldn't worry about it, but it depends on your performance requirements.

Hi magnusbaeck
Thanks for the reply. I don't understand too much about ruby. Any pointer about how should i proceed ? Also whether the following is a hash or an array ?

"Pname" => [
[ 0] "Windows Genuine Advantage Validation Tool (KB892130)",
[ 1] "Intel(R) PRO Network Connections 11.2.0.69",
[ 2] "AddressBook",
..................
],
I want to declare and initialize the above event variable. Is it like -

event => { "Pname" => [
"Windows Genuine Advantage Validation Tool (KB892130)",
"Intel(R) PRO Network Connections 11.2.0.69",
"AddressBook",
]
}
Regards

Also whether the following is a hash or an array ?

That's an array with three elements.

I want to declare and initialize the above event variable.

Sorry, I don't understand what you want to do.

May be i am asking you too much about ruby in a logstash forum :slight_smile:

I want to try out the merging of arrays in separate ruby program. Now comparing with the rubydebug output, i am trying to declare a event array with all the values like Pname, Vendor, version etc. Declaring it like following gives me a lot of error. Scratching my head with ruby....first time.

event = [
"Pname" => [
"Windows Genuine Advantage Validation Tool (KB892130)",
"Intel(R) PRO Network Connections 11.2.0.69",
"AddressBook"
],
"Identity_No" => [
"null",
"{2222B364-0854-4265-B32E-A142DB9DC7BB}",
"null"
],
"Install_State" => [
"5",
"5",
"5"
]
]

Hi
I am using zip functionality to form arrays with values from same index. The following seems to work in irb.

irb(main):001:0> users = ["john", "david", "peter"]
=> ["john", "david", "peter"]
irb(main):002:0> students = ["Mike", "Tim", "Monique"]
=> ["Mike", "Tim", "Monique"]
irb(main):003:0> list = users.zip(students)
=> [["john", "Mike"], ["david", "Tim"], ["peter", "Monique"]]

If i do the same in logstash filter section, like below-

ruby {
               code => " list = version.zip( vendor )  "
       }

I get the following error-

ERROR logstash.filters.ruby - Ruby exception occurred: undefined local variable or method `version' for #LogStash::Filters::Ruby:0x25278b7

Also, whether the zip method is the best way to combine the arrays for this scenario?

Hi
The error has gone after i updated the filter section with following code

ruby {
               code => " list = ['version'].zip( ['vendor'] )"
       }

Hi
I have initialized a global variable that will hold the merged string in the code section -

 ruby {
                init => "$eventlist = Array.new;"
               code => " $eventlist = ['version'].zip( ['vendor'] ) 
            }
         mutate {
                remove_field => [ "parsed" ]
                add_field => [ "finalevent", "%{$eventlist}" ] //Adding the eventlist 
        }

The rubydebug displays -

"NEWFIELD" => "%{$eventlis]}",

The global variable lost its scope/value in the mutate section. Any idea how should i proceed ?

Change your ruby filter to this:

ruby {
  code => "event.set('finalevent', ['version'].zip(['vendor']))"
}

Hi
Magnus Baeck
I have tried with this method but the output of ruby debug shows following-

"finalevent" => [
        [0] [
            [0] "version",
            [1] "vendor"
        ]
    ],

The output of rubydebug correctly shows the value of vendor and version.
"vendor" => [
[ 0] "Microsoft Corporation",
[ 1] "Intel",
[ 2] "null",
[ 3] "Diebold",

.............................
.............................
"version" => [
[ 0] "1.7.0069.2",
[ 1] "null",
[ 2] "null",
[ 3] "3.0.0.1",

What's the difference between those two snippets?

Ah, Updated

Oh, so when you said ['version'] you actually meant "the contents of the version field"? Then replace ['version'] with event.get('version').

Hi Magnusbaeck, This one worked and i also realized the new event methods of ELK 5. Thank you very much.

Regards
NB: Wrongly marked this post as solution. Updated :innocent:

Hi Magnusbaeck
The ruby code that you have suggested had created the 'finalevent' array. But I have realized now that the event (finalevent) is still singular. So i am trying to fetch the values from the finalevent array to create separate field for each of the indices ( finalevent array) using the following ruby code.
ruby {

 i = 0
              while i < event.get( 'finalevent' ).length do
                     event.set('Software Application' + i.to_s , event.get('finalevent[ + i.to_s + ])' )
                    //  event.set('Software Application' + i.to_s , event.get('finalevent[ i ])' )
              end

}
My intention is to get separate event like-

Software Application 1 [
values
]

Software Application 2 [
values
]
............................................
............................................

But the above code snippets is not working. The above code snippets or its variations always gives a syntax errors or sometimes it waits forever. I have run a simple ruby code like above (without using set or get method ) separately and it did worked.

Any suggestion from your side ?

syntax error, unexpected kEND
end }
^

Regards

I'm not sure exactly why your code doesn't work but try

i = 0
while i < event.get( 'finalevent' ).length do
  event.set('Software Application' + i.to_s , event.get('finalevent')[i])
end

instead. Here's another way:

event.get('finalevent').each_with_index { |value, i|
  event.set('Software Application' + i.to_s, value)
}

Hi
Magnusbaeck
The second code snippet worked and the first one did not produce any compile error but during run time logstash stalled. Now coming back to the event generated by the code snippet, the JSON looks like the following-

{
"_index": "logstash-xml-16.06.2017",
"_type": "logs",
........................
"_source": {
"Software Application 0": [
"Windows Genuine Advantage Validation Tool (KB892130)",
"Windows Genuine Advantage Validation Tool (KB892130)",
......................................................
],
"Software Application 1": [
"Intel(R) PRO Network Connections 11.2.0.69",
"Intel(R) PRO Network Connections 11.2.0.69",
"Intel(R) PRO Network Connections 11.2.0.69",
.......................................................
],

So basically i am still getting a single event in the kibana. To make it multiple event i have used metricize plugin and it did generated event for each 'software application' as i can see in the output of ruby debug but in kibana i get only one count (bar) for 'software application 0'.

Is not it possible to get multiple event/count/bar for the above JSON ( 'software application') ? Like for 5 'Software Aplication' tag there are 5 event/count/bar in kibana.
The JSON file is here

Regards