Complex xml/json data insert

Hi there!
I'm new to the ELK stack and I need some advice for inserting my complex xml's into the ELK stack

here you can see an example of only one of 10.163 xml's

I have used the xml filter in logstash but this creates a lot of separate documents that are unlinked, in other words i lose all (hierarchical) structure.
Is there a way to use the xml filter in logstash and keeping the structure or having the xml data in one document?

I have also tried converting the xml to JSON and using the JSON filter in logstash, but my data has a lot of mapping errors, also separate indices for each xml doesn't work because there are mapping errors within the same xml.
I was thinking about splitting my json data into separate documents so i don't get mapping errors

Am I missing an awesome way to get these huge files into the ELK stack?

My xml filter

filter {
	xml {
		remove_namespaces => true
		store_xml => false
		source => "[message]"
		target => "[log_event]"
		force_array => false
	}

Creates documents like this (only a small part)

logstash_1       | {
logstash_1       |        "message" => "    <al> De wet regelt het toezicht op deze besturen. Vernietiging van besluiten van deze besturen kan alleen geschieden wegens strijd met het recht of het algemeen belang. </al>",
logstash_1       |           "port" => 43214,
logstash_1       |           "host" => "gateway",
logstash_1       |       "@version" => "1",
logstash_1       |     "@timestamp" => 2019-05-06T14:46:45.574Z
logstash_1       | }
logstash_1       | {
logstash_1       |        "message" => "  <meta-data><jcis><jci versie=\"1.3\" verwijzing=\"jci1.3:c:BWBR0001840&amp;hoofdstuk=7&amp;artikel=134&amp;lid=3&amp;z=2018-12-21&amp;g=2018-12-21\" onderdeel=\"lid=3\"/></jcis></meta-data></lid>",
logstash_1       |           "port" => 43214,
logstash_1       |           "host" => "gateway",
logstash_1       |       "@version" => "1",
logstash_1       |     "@timestamp" => 2019-05-06T14:46:45.574Z
logstash_1       | }
logstash_1       | {
logstash_1       |        "message" => "<meta-data><brondata><oorspronkelijk><publicatie effect=\"tekstplaatsing-wijziging\" soort=\"Stb\" urlidentifier=\"stb-2019-33\"><publicatiejaar>2019</publicatiejaar><publicatienr>33</publicatienr><uitgiftedatum isodatum=\"2019-02-08\">08-02-2019</uitgiftedatum><ondertekeningsdatum isodatum=\"2019-01-16\">16-01-2019</ondertekeningsdatum></publicatie></oorspronkelijk><inwerkingtreding><publicatie effect=\"tekstplaatsing-wijziging\" soort=\"Stb\" urlidentifier=\"stb-2018-493\"><publicatiejaar>2018</publicatiejaar><publicatienr>493</publicatienr><uitgiftedatum isodatum=\"2018-12-21\">21-12-2018</uitgiftedatum><ondertekeningsdatum isodatum=\"2018-11-26\">26-11-2018</ondertekeningsdatum><dossierref dossier=\"34716\">34716</dossierref></publicatie><inwerkingtreding.datum isodatum=\"2018-12-21\">21-12-2018</inwerkingtreding.datum></inwerkingtreding></brondata><jcis><jci versie=\"1.0\" verwijzing=\"1.0:c:BWBR0001840&amp;artikel=134&amp;g=2018-12-21\" onderdeel=\"artikel=134\" korte-verwijzing=\"1.0:c:BWBR0001840&amp;artikel=134\"/><jci versie=\"1.3\" verwijzing=\"jci1.3:c:BWBR0001840&amp;hoofdstuk=7&amp;artikel=134&amp;z=2018-12-21&amp;g=2018-12-21\" onderdeel=\"artikel=134\" korte-verwijzing=\"jci1.3:c:BWBR0001840&amp;artikel=134\"/></jcis></meta-data></artikel><artikel bwb-ng-variabel-deel=\"/Hoofdstuk7/Artikel135\" stam-id=\"2991503\" versie-id=\"25690692\" id=\"C36648521\" label-id=\"2941964\" inwerking=\"2018-12-21\" label=\"Artikel 135\" bron=\"Stb.2019-33\" effect=\"tekstplaatsing-wijziging\" ondertekening_bron=\"2019-01-16\" publicatie_bron=\"2019-02-08\" publicatie_iwt=\"2018-12-21\" status=\"goed\">",
logstash_1       |           "port" => 43214,
logstash_1       |           "host" => "gateway",
logstash_1       |       "@version" => "1",
logstash_1       |     "@timestamp" => 2019-05-06T14:46:45.574Z
logstash_1       | }

ps. my filter doesn't seem to work properly

Your examples are not valid XML. The file you link to is missing a closing tag for "wetgeving". If you had valid XML (removing the trailing </lid> in your second message)

<meta-data><jcis><jci versie="1.3" verwijzing="jci1.3:c:BWBR0001840&amp;hoofdstuk=7&amp;artikel=134&amp;lid=3&amp;z=2018-12-21&amp;g=2018-12-21" onderdeel="lid=3"/></jcis></meta-data>

Then if you set "store_xml => true" you would get

 "log_event" => {
    "jcis" => {
        "jci" => {
                "versie" => "1.3",
            "verwijzing" => "jci1.3:c:BWBR0001840&hoofdstuk=7&artikel=134&lid=3&z=2018-12-21&g=2018-12-21",
             "onderdeel" => "lid=3"
        }
    }
},

hmm it seems my link indeed had a non valid xml (my fault). When i checked the original file it was valid xml.

So now i'm doing

filter {
	xml {
		remove_namespaces => true
		store_xml => true
		source => "[message]"
		target => "[log_event]"
		force_array => false
	}
}

with this xml file.

And now i'm getting these errors

logstash_1       | [2019-05-06T19:08:17,884][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"            </entry>", :exception=>#<REXML::ParseException: Missing end tag for '' (got "entry")
logstash_1       | Line: 1
logstash_1       | Position: 20
logstash_1       | Last 80 unconsumed characters:
logstash_1       | [2019-05-06T19:08:17,890][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"      <tgroup align=\"left\" char=\"\" charoff=\"50\" cols=\"2\" colsep=\"0\" rowsep=\"0\">", :exception=>#<REXML::ParseException: No close tag for /tgroup
logstash_1       | Line: 1
logstash_1       | Position: 79
logstash_1       | Last 80 unconsumed characters:
logstash_1       | [2019-05-06T19:08:17,891][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"        <tbody valign=\"top\">", :exception=>#<REXML::ParseException: No close tag for /tbody
logstash_1       | Line: 1
logstash_1       | Position: 28
logstash_1       | Last 80 unconsumed characters:
logstash_1       | [2019-05-06T19:08:17,895][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"      </tgroup>", :exception=>#<REXML::ParseException: Missing end tag for '' (got "tgroup")

So it seems it is still splitting my original xml, but now it fails at parsing it

Are you using a multiline codec on a file input? My guess is that the XML spans multiple lines and they need to be combined before trying to parse it.

I'm using TCP

input {
	tcp {
		port => 5000
	}
}

And then just nc localhost 5000 < file.xml

Indeed my xml spans multiple lines, how would I use a multiline codec?
I don't necessarily need TCP, it just seemed the easiest.

I would do it using a file input. By using a pattern that never matches you can consume the entire file and a single event...

input {
    file {
        path => "/home/user/file.xml"
        sincedb_path => "/dev/null"
        start_position => "beginning"
        codec => multiline {
            pattern => "^Spalanzani"
            what => "previous"
            negate => true
            auto_flush_interval => 5
            max_lines => 2100
        }
    }
}

I feel like i'm close but i still get errors with

input {
    file {
        path => "/Users/nielsvanrijn/Desktop/bwb/bwb-repository-2019-02-23/BWBR0001840/2018-12-21_0/xml/BWBR0001840_2018-12-21_0.xml"
        sincedb_path => "/dev/null"
        start_position => "beginning"
        codec => multiline {
            pattern => "^Spalasfdgnzani"
            what => "previous"
            negate => true
            auto_flush_interval => 5
            max_lines => 5000
        }
    }
}
filter {
	xml {
		remove_namespaces => true
		store_xml => true
		source => "[message]"
		target => "[log_event]"
		force_array => false
	}
}

and this xml

logstash_1       | [2019-05-06T19:08:27,952][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"            <entry colname=\"Col2\" colsep=\"0\" morerows=\"0\" rotate=\"0\" rowsep=\"0\">", :exception=>#<REXML::ParseException: No close tag for /entry
logstash_1       | Line: 1
logstash_1       | Position: 80
logstash_1       | Last 80 unconsumed characters:
logstash_1       | [2019-05-06T19:08:27,954][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"            <entry colname=\"Col3\" colsep=\"0\" morerows=\"0\" rotate=\"0\" rowsep=\"0\">", :exception=>#<REXML::ParseException: No close tag for /entry
logstash_1       | Line: 1
logstash_1       | Position: 80
logstash_1       | Last 80 unconsumed characters:
logstash_1       | [2019-05-06T19:08:27,969][WARN ][logstash.filters.xml     ] Error parsing xml with XmlSimple {:source=>"[message]", :value=>"            </entry>", :exception=>#<REXML::ParseException: Missing end tag for '' (got "entry")

I would be very surprised to get that error message from that XML because that XML does not contain the word "entry".

With that XML I get an error saying there is no closing tag for wetgeving/toestand. That is caused by the lack of a newline at the end of the last line of the file. If you edit the file and add the newline then the multiline codec will consume the last line and the xml filter parses as expected.

Yup I fixed my problem! Thank you Badger!:grinning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.