Remove xml-tags during parsing


#1

Hello,

So far I've only experience with indexing csv-files with Logstash into Elasticsearch. Now I have a couple of xml-files which I want to index and fortunately it was not that difficult. With the most basic conf I've managed to get the data in Elasticsearch:

filter
{
    xml  {
      source => "VINAnalysis"
      remove_tag => [ "%{SystemInfo}"]
    }
}

The tag VINAnalysis encloses the whole xml-file so I've used that one as my source. When I look at the data in Kibana I see that Logstash has indexed all xml-tags. I want to get rid of those because I don't want them searchable.

I thought I can remove those tags with the remove_tag option and one of the XML-tags is

<SystemInfo>data: data</Systeminfo>.

I've added it to my conf which you can see above but the tag is still being indexed. What am I doing wrong?


(Magnus Bäck) #2

Perhaps it would be a better idea to use the xpath option to selectively save things you do want to save, instead of extracting everything and ripping out the boring stuff?

  remove_tag => [ "%{SystemInfo}"]

There are several reasons why this doesn't work.

  • "Tags" in Logstash have nothing to do with XML tags in parsed XML documents.
  • You should use use the %{foo} notation when you want to expand the contents of a field. In this case you want to reference a field by name.
  • The SystemInfo field is a nested field so you need to access it via e.g. [VINAnalysis][SystemInfo] or whatever the structure looks like.

#3

Thank you for clearing it up!

I'm playing around with xpath and have indexed the file again but I'm not sure what I should expect in the results. This is the structure of the file:

<VINAnalysis>
<BasicInfo>
<Info1>Hash:</Info1><InfoVale>0000000000AAAAA</InfoVale>
<Info2>More than 1 found:</Info2><InfoVale>No</InfoVale>
<Info3>Save Time:</Info3><InfoVale>2016-11-25 15:38:30</InfoVale>

This is a part of my conf:

filter
{
    xml  {
      source => "VINAnalysis"
      xpath => {
      "//VINAnalysis/BasicInfo/Info1/InfoVale" => "hash"
      "//VINAnalysis/BasicInfo/Info2"/InfoVale => "more_than_1_found"
       "//VINAnalysis/BasicInfo/Info3"/InfoVale => "save_time"
               }
          }
}

But I still see the data with the tags in Kibana.

Any ideas what I'm doing wrong?


(Magnus Bäck) #4

Make sure you set store_xml => false.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.