Logstash json to multiple documents

I have an input in the below format:
<data><te0><id>1</id><text>this is first event</text></te0><te1><id>2</id><text>this is second event</text></te1><te2><id>3</id><text>this is third event></text></te2><te0><id>4</id><text>this is fourth event</text></te0></data>

The above is a single input event to logstash. I want to convert the above single event to multiple events and push it to elasticsearch only for the te0 attribute value as a document identified by it's id. So, for the above example, result should be:
say index: xml_test
xml_test/doc/1/: { "_id": 1, "text": "this is first event" }
xml_test/doc/4/: { "_id": 4, "text": this is fourth event" }

Below is the logstash config I am trying to use:
grok {
match => [ "message", "%{GREEDYDATA:inxml}" ]
}
xml {
source => "inxml"
target => "xmldata"
force_array => false
}
json {
source => "xmldata"
}
split {
field => "xmldata"
}
}

I am getting _json_parse_failure, I see that it's already a json in elasticsearch and also _split_failure since it can only happen on string or array but says xmldata is a hash.
something like this:
"xmldata": {
"te0": {
"text": "this is a second doc",
"id": "1"
},
"te2": {
"text": " this is supposed to be 3rdor4th doc",
"id": "2"
}
},
and
"tags": [
"_jsonparsefailure",
"_split_type_failure"
],

How can I convert the above input to multiple docs for the values in attributes only inside

What does your data look like?

Hi,
I did put a sample event, not sure what happened to it: here it is:
<data><te0><id>1</id><text>this is first event</text></te0><te1><id>2</id><text>this is second event</text></te1><te0><id>3</id><text>this is third event</text></te0><te2><id>4</id><text>this is fourth event</text></te2></data>

Parsing your sample xml results in

   "xmldata" => {
    "te0" => [
        [0] {
            "text" => "this is first event",
              "id" => "1"
        },
        [1] {
            "text" => "this is third event",
              "id" => "3"
        }
    ],
    "te1" => {
        "text" => "this is second event",
          "id" => "2"
    },
    "te2" => {
        "text" => "this is fourth event",
          "id" => "4"
    }
},

You say you want ids 1 and 4. What test do you use to drop ids 2 and 3?

All I want is to push to elasticsearch as seprate documents for id 1 and id 3 and remove anything else other than te0.
so kind out output should index two documents to elasticsearch. and below are the two docs:
doc 1 with id: 1 and doc as { "id":1, "text": "this is first event", "@timestamp":".....".......all metadata...}
doc 2 with id: 3 and doc as { "id":3, "text": "this is third event", "@timestamp":".....".......all metadata...}

So if you iterate over the members of xmldata you want to ignore any that are not arrays? And if they are arrays then split them?

I want only the members under xmldata with attribute te0, and ignore everything (can be multiple te1, te2 ,te3 and so on). and most of them are arrays (te*)

xml data will merge all te0s into one array and will have a single te0 key and first occurence in the xml of te0 will be 1st element in array and so on....Now, my split is creating documents with te0 [0] and the rest of the vars (te1, te2....), te0[1] and the rest of the vars(te1, te2....) and so no....
But I want is only te0[0] as one document without any other te*s, te[1] as another document.

Try

    ruby {
        code => '
            event.get("xmldata").each { |k, v|
                unless k == "te0"
                    event.remove("[xmldata][#{k}]")
                end
            }
        '
    }
    split { field => "[xmldata][te0]" }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.