Please give me a example about Logstash xml filter

Hello everyone. I'm a newbie using a Elastic Stack.
I'd like to use xml filter, so I wish get a result format

please give me a easy xml filter sample.

My xml source is like this :

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<m2m:cin xmlns:m2m=\"http://www.onem2m.org/xml/protocols\"
		 xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
	<ty>4</ty>
	<ri>CI00000000001244656716</ri>
	<rn>CI00000000001244656716</rn>
	<pi>CT00000000000000046769</pi>
	<ct>2018-02-05T15:06:30+09:00</ct>
	<lt>2018-02-05T15:06:30+09:00</lt>
	<ppt>
		<gwl>36.83115, 127.11185, 76</gwl>
		<geui>0017b2fffe0ad93e</geui>
	</ppt>
	<sr>/0240771000000168/v1_0/remoteCSE-00000168000c05c016104807/container-LoRa//subscription-SS00000000000000261472</sr>
	<et>2018-02-06T15:06:30+09:00</et>
	<st>11785</st>
	<cr>RC00000000000000050648</cr>
	<cnf>LoRa/Sensor</cnf>
	<cs>76</cs>
	<con>010400003039499602d2499602d203e703e70000000003e703e70000000000000000fff51234</con>
</m2m:cin>

So, I want only about <con> ... </con>

010400003039499602d2499602d203e703e70000000003e703e70000000000000000fff51234

When posting code, be sure to use the Preformatted Code button, the icon looks like </>. This preserves all the tags and makes reading code WAY easier.

There are a couple questions and steps you need to answer before you get to the XML piece. Will you ever want to associate more than just the con field with each event? Does each file only contain one <con> tag? Once you can answer these questions, you'll know how to construct your multiline codec. Elastic generates events based on a single line of log data. For example, what you posted is a single event but Logstash will see it as 22 separate events. Your input section will look something like:

input {
  file {
    codec => multiline {
      pattern => "<?xml"
      negate => true
      what => "previous"
    }
  }
}

This says if the pattern does not match <?xml, place it on the previous line. What you should get, based on your example, is a single line of data. Now you can build out your XPath in your filter section

filter {
  xml {
    xpath => [
      "/con/text()", "FieldName1",
      "/ppt/gwl/text(), "FieldName2"
    ]
  }
}

If your files have multiple con fields and values, they will be placed in a field as an array. So a file with the below in it:

<con>1</con>
<con>2</con>
<con>3</con>
<con>4</con>

will create a field with the values 1, 2, 3, 4. If this is undesired and you want each value in it's own event, you will need to adjust your multiline so that each instance of con appears on its own line.

Thanks your answer
but.. your example's result like this:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "parse",
        "_type": "doc",
        "_id": "Uk5BlGIBMu4GeP3SFr_i",
        "_score": 1,
        "_source": {
          "@timestamp": "2018-04-05T05:21:55.870Z",
          "message": """	xmlns:m2m=\"http://www.onem2m.org/xml/protocols\"
""",
          "headers": {
            "content_length": "809",
            "http_cache_control": "no-cache",
            "content_type": "application/xml",
            "http_postman_token": "c8d8b506-f8af-4ad0-bcf5-a0f53f9e8e05",
            "http_accept_encoding": "gzip, deflate",
            "request_method": "PUT",
            "request_path": "/",
            "http_version": "HTTP/1.1",
            "http_host": "localhost:8080",
            "http_user_agent": "PostmanRuntime/7.1.1",
            "request_uri": "/",
            "http_accept": "*/*",
            "http_connection": "keep-alive"
          },
          "host": "0:0:0:0:0:0:0:1",
          "@version": "1"
        }
      },
      {
        "_index": "parse",
        "_type": "doc",
        "_id": "UU5BlGIBMu4GeP3SFr-e",
        "_score": 1,
        "_source": {
          "message": """
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<m2m:cin
""",
          "host": "0:0:0:0:0:0:0:1",
          "tags": [
            "multiline",
            "_xmlparsefailure"
          ],
          "@version": "1",
          "@timestamp": "2018-04-05T05:21:55.852Z",
          "headers": {
            "content_length": "809",
            "http_cache_control": "no-cache",
            "content_type": "application/xml",
            "http_postman_token": "c8d8b506-f8af-4ad0-bcf5-a0f53f9e8e05",
            "http_accept_encoding": "gzip, deflate",
            "request_method": "PUT",
            "request_path": "/",
            "http_version": "HTTP/1.1",
            "http_host": "localhost:8080",
            "http_user_agent": "PostmanRuntime/7.1.1",
            "request_uri": "/",
            "http_accept": "*/*",
            "http_connection": "keep-alive"
          }
        }
      }
    ]
  }
}

My example isn't the complete answer, it's just an example. What's your full pipeline config, it looks like your input is http_poller?

My input is http!
Here's my Logstash Configuration..

input {
	http{
		codec => multiline {
			pattern => "<?xml"
			negate => true
			what => "previous"
		}
	}
}
filter {
	xml{		
		store_xml => false
		source => "message"
		xpath => ["/con/text()", "parsedCon"]
	}
}

output {
	elasticsearch {
		index => "parse"
		hosts => "localhost:9200"
	}
	stdout {
		codec => rubydebug
	}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.