Parsing xml document using xpath


(Usha Datt) #1

I am trying to parse the following xml data using logstash.. I am able to do it for a single document..But when I am increasing number of documents, its not working..

<Book:Body>
    <Book:Head>
        <bookname>Book:Name</bookname>
            <ns:Hello xmlns:ns="www.example.com">
                <ns:BookDetails>
                    <ns:ID>123456</ns:ID>
                    <ns:Name>ABC</ns:Name>
                </ns:BookDetails>
			</ns:Hello xmlns:ns="www.example.com">
    </Book:Head>
<Book:Body>

My config file is as given:

multiline 	{
                       pattern =>  "<Book:Body>"
                        what => "previous"
			negate => "true"
			}
				
                xml {
                        store_xml => "false"
                        source => "message"
			remove_namespaces => "true"
						
                        xpath =>[
                                "/Book/Book/BookDetails/ID/text()","UUID",	
				"/Book/Book/BookDetails/Name/text()","Name"
					]
			}
               
                mutate {
                        add_field => ["IDIndexed", "%{ID}"]
			add_field => ["NameIndexed", "%{Name}"]
				}

(Magnus Bäck) #2

Could you be a bit more specific than "it's not working"? What do you get? Is there anything interesting in the logs? What do you mean by "multiple documents", multiple consecutive Book:Body elements in the same file...?


(Usha Datt) #3

Yes, I mean multiple consecutive Book:Body elements in the same file.. With just one entry like my example, it is parsing the two fields ID and Name and mutate filter is adding new fields..But with multiple records, it is not able to parse the message and the fields %{ID} and %{Name} appear as it is without any values..Is there something wrong with my multiline pattern or xpath?


(Magnus Bäck) #4

The multiline pattern looks okay. I suggest you simplify things by removing the xml filter and just emitting messages with the joined XML lines. What happens then if you feed Logstash a file with multiple Book:Body elements?

BTW, your example Book:Body element ends with <Book:Body> rather than </Book:Body>. I assume that was a typo?


(Usha Datt) #5

Actually I have tried the example again without namespace ns, so logstash was able to parse the document, but when I am using ns in all the tags as given in the BOOK:Body elements, it is not parsing it.. I have even used the remove_namespaces tag in the xml filter.. I guess the problem is due to namespace of XML tags.. I was working with this example without namespaces:

    <Book>
        <bookname>Book:Name</bookname>
            <Hello>
                <BookDetails>
                    <ID>123456</ID>
                    <Name>ABC</Name>
                <BookDetails>
	</Hello>
 </Book>

Yeah sorry for the typo! </Book:Body>


(Navneet Mathpal) #6

+1 getting the same issue ( remove_namespaces => true # not removing the name spaces)


(system) #7