Unable to load the xml document to elastic search

Hi Team, I am trying to load the whole xml document to elastic search using logstash. I don't see any errors in the logstash console, and the document is not there in elastic search also. my config file below.

input {
  file {
    path => "/Users/userid/POC/ELK/data/data.xml"
    start_position => "beginning"
    sincedb_path => "nul"
      type => "xml"
    codec => multiline
      {
       pattern => "<Document"
       negate => "true"
       what => "previous"
       auto_flush_interval=>2
      }
    }
}

filter {
    xml {
      source => "message"
      remove_namespaces => true
      target => "doc"
  }
}
output
{
	stdout {
	}

 	elasticsearch {
    hosts => ["localhost:9200"]
    index => "document"
  }

}

Sample Xml document

<?xml version="1.0" encoding="UTF-8"?>
<Document> 
    <recordTarget>
          <role>
                   .....
                  ...... have multiple internal tags....
          </role>
    </recordTarget>
 </Document>

You do not have a field called [ClinicalDocument]. The file input will create a field called [message].

If you do not want the in-memory sincedb persisted across restarts then use sincedb_path => "NUL" on Windows and sincedb_path => "/dev/null" on UNIX.

Hi @Badger , Thanks for your response. I have updated the config with correct tag. Also, I tried using the Source as message, still I am getting the same response, I don't see any error message but not able to see it in elastic search.

Hi @Badger I was able to see the xml in the kibana after running the logstash command using sudo command. But the XML parsing is failing at different stages. Single XML file created 7 events with the below tags multiline, multiline_codec_max_lines_reached, _xmlparsefailure. I would like to create a single event for the xml content. Below is my current config file.

input {
    file {
        path => "/Users/user/POC/ELK/data/data.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        codec => multiline {
            pattern => "<Document"
            negate => "true"
            what => "previous"
            auto_flush_interval => 1
            max_lines => 2000
        }
    }
}
filter {
    xml {
      source => "message"
      target => "xml_content"
    }
}
output {
    stdout { codec => rubydebug }
    elasticsearch{
      hosts => ["localhost:9200"]
      index => "document"
    }

}

If the multiline codec stops accumulating lines before it reaches the next <Document element then it is not going to be valid XML and the xml filter will not be able to parse it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.