XML XPath filter is parsing fields but not inserting in Elasticsearch

Hi @magnusbaeck

I am trying to parse the below xml to insert records in Elasticsearch :

<xmldata>
 <head1>
  <key1>Value1</key1>
  <key2>Value2</key2>
  <id>0001</id>
  <date>01-01-2016 09:00:00</date>
 </head1>
 <head2>
  <key3>Value3</key3>
 </head2>
</xmldata>

My Configuration file looks like :

input {
 file {
  path => ["C:/somepath/text20.xml"]
  start_position => beginning
  sincedb_path => "/dev/null"
  codec => multiline
  {
   pattern => "<xmldata"
   negate => "true"
   what => "previous"
   auto_flush_interval=>2
  }
 }
}
filter {
  xml {
	source => "message"
	target => "data"
	store_xml => false
	xpath => {
		"/xmldata/head1/id/text()" => "id"
        "/xmldata/head1/key1/text()" => "key1"
		"/xmldata/head1/key2/text()" => "key2"
       }
	}
}
output {

 elasticsearch {
  codec => json
  index => "xtest1"
  hosts => ["localhost:9200"]
  document_type => "data"
 }
  stdout { codec => rubydebug }
}

I am able to see the fields mentioned in Xpath on the console with stdout but on elasticsearch those fields are missing. My output on console looks like :

[2018-03-14T13:46:44,391][INFO ][logstash.agent           ] Pipelines running {:
count=>1, :pipelines=>["main"]}
{
       "message" => "<xmldata>\r\n <head1>\r\n  <key1>Value1</key1>\r\n  <key2>V
alue2</key2>\r\n  <id>0001</id>\r\n  <date>01-01-2016 09:00:00</date>\r\n </head
1>\r\n <head2>\r\n  <key3>Value3</key3>\r\n </head2>\r\n</xmldata>\r",
    "@timestamp" => 2018-03-14T12:46:44.731Z,
      "@version" => "1",
          "tags" => [
        [0] "multiline"
    ],
          "path" => "C:/somepath/text20.
xml",
          "host" => "4000511768",
            "id" => [
        [0] "0001"
    ],
          "key1" => [
        [0] "Value1"
    ],
          "key2" => [
        [0] "Value2"
    ]
}

But the id, key1, key2 fields are not present in my elasticsearch Index.
When I modify my config file with :-

mutate {
   add_field => { "ID" => "%{id}"
		"KEY1" => "%{key1}"
		"KEY2" => "%{key2}" 
		}
   remove_field => ["message"]
   }

Then I am able to see those fields in my elasticsearch index.

But shouldn't the fields parsed in Xpath be present in elasticsearch as well if I am seeing them on the Stdout ? Can some help me resolve this issue.

The elasticsearch version : 5.6.1 and Logstash version : 6.2.2

Interesting that it even works, there's a few syntax errors that might be the cause. You've got xpath enclosed in curly brackets instead of square brackets and no comma at the end of each xpath path to delineate each path. Give this a shot, see if it fixes it.

filter {
  xml {
    source => "message"
    target => "data"
    store_xml => false
    xpath => {
      "/xmldata/head1/id/text()" => "id",
      "/xmldata/head1/key1/text()" => "key1",
      "/xmldata/head1/key2/text()" => "key2"
    ]
  }
}
xpath =>
   [
    "/xmldata/head1/id/text()", "id",
    "/xmldata/head1/key1/text()", "key1",
    "/xmldata/head1/key2/text()", "key2"
   ]

Tried with this too, got the same results. Syntactically both the Xpath with Curly braces or Square braces are the same. The issue I am facing is I can see the output on the console with Stdout but the field is not being added in Elasticsearch index. When I use the Mutate Add field, I am getting the desired result in Elasticsearch.
add_field => { "ID" => "%{id}"}

Seeing the output on console with Stdout and not in Es index is strange. I wonder if any one has faced such issue before.

Ya, I understand your issue, in my very limited experience, I've never seen this kind of behavior, which is why I suggested the syntax changes, can't think of any other reason for this to be happening. Have you enabled debug logging for Logstash/ElasticSearch? They may shed more light on the issue.

Yes I have gone through each of the log, but the logs shows nothing wrong. Am I missing any property to specify in the Output Elasticsearch? I came to this as the stdout of rubydebug is showing everything perfect.

The Xpath is also parsing everything correctly, if not the add_field => { "ID" => "%{id}"} wouldn't have detected the id and the other columns. The rubydebug is always showing everything correct, the only thing remaining is the Elasticsearch output section.

I wonder if you are not seeing them in es in the way you expect is because they are arrays. Doing the add_field has this effect on the rubydebug output

            "id" => [
        [0] "0001"
    ],
            "ID" => "0001",

As per the documentation, Values returned by XPath parsing from xpath-syntax will be put in the destination field. Multiple values returned will be pushed onto the destination field as an array. But I am parsing a simple xml with no multiple values and I don't see any destination field added to ES but I do see them in rubydebug output.

After doing the Add_field, I am seeing the correct field in ES. Am I missing anything here that even the single entries are stored as arrays ? But why aren't they pushed to ES?

Yes, even single entries are stored as arrays. There is a "force_array => false" option but for me this does not seem to work most of the time (but there are times when it does). I have no idea why they do not go to es if they appear in the rubydebug output

Tried with the force_array, but I am still seeing the output in the form of arrays on rubydebug but not those fields in ES.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.