Logstash split xml into array


(Evaldas Buinauskas) #1

Is it possible to convert xml into array of objects using logstash?

That'd be my sample document:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : "<root><Tags><TagTypeID>1</TagTypeID><TagValue>twitter</TagValue></Tags><Tags><TagTypeID>1</TagTypeID><TagValue>facebook</TagValue></Tags><Tags><TagTypeID>2</TagTypeID><TagValue>usa</TagValue></Tags><Tags><TagTypeID>3</TagTypeID><TagValue>smartphones</TagValue></Tags></root>"
}

Ideally, I'd like to output this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : [
    {
      "TagTypeID" : "1",
      "TagValue" : "twitter"
    },
	{
      "TagTypeID" : "1",
      "TagValue" : "facebook"
    },
	{
      "TagTypeID" : "2",
      "TagValue" : "usa"
    },
	{
      "TagTypeID" : "3",
      "TagValue" : "smartphones"
    }
  ]
}

However I'm not able to achieve that. I tried using xml filter like that:

xml
{
	source => "Metadata"
	target => "Parsed"
}

However, it outputs this

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
	  {
        "TagTypeID" : ["1"],
        "TagValue" : ["twitter"]
      },
	  {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
	  {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
	  {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}

I don't want my values to be stored as arrays (I know there's always going to be just one value there).

I know what fields are going to be brought back from my input, so I can map structure myself and this doesn't need to be dynamic (although that would be nice).

Allow splitting of lists / arrays into multiple events seemed to be useful, but it's poorly documented and I couldn't find information how to use this filter for my use-case.

http://stackoverflow.com/questions/26362303/logstash-split-event-from-an-xml-file-in-multiples-documents-keeping-informatio is similar, but not exactly what I'd like to achieve.

http://stackoverflow.com/questions/31880172/logstash-xml-to-json-output-from-array-to-string this seems to be useful, however it hardcodes that first element of array must be outputed as single item (not part of array). It brings me back this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
	  {
        "TagTypeID" : "1",
        "TagValue" : "twitter"
      },
	  {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
	  {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
	  {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}
  1. Can this be done without having to create custom filters? (I've no
    experience in Ruby)
  2. Or am I missing something basic here?

Could someone help?


(system) #2