I have an input in the below format: <data><te0><id>1</id><text>this is first event</text></te0><te1><id>2</id><text>this is second event</text></te1><te2><id>3</id><text>this is third event></text></te2><te0><id>4</id><text>this is fourth event</text></te0></data>
The above is a single input event to logstash. I want to convert the above single event to multiple events and push it to elasticsearch only for the te0 attribute value as a document identified by it's id. So, for the above example, result should be:
say index: xml_test
xml_test/doc/1/: { "_id": 1, "text": "this is first event" }
xml_test/doc/4/: { "_id": 4, "text": this is fourth event" }
Below is the logstash config I am trying to use:
grok {
match => [ "message", "%{GREEDYDATA:inxml}" ]
}
xml {
source => "inxml"
target => "xmldata"
force_array => false
}
json {
source => "xmldata"
}
split {
field => "xmldata"
}
}
I am getting _json_parse_failure, I see that it's already a json in elasticsearch and also _split_failure since it can only happen on string or array but says xmldata is a hash.
something like this:
"xmldata": {
"te0": {
"text": "this is a second doc",
"id": "1"
},
"te2": {
"text": " this is supposed to be 3rdor4th doc",
"id": "2"
}
},
and
"tags": [
"_jsonparsefailure",
"_split_type_failure"
],
How can I convert the above input to multiple docs for the values in attributes only inside
Hi,
I did put a sample event, not sure what happened to it: here it is: <data><te0><id>1</id><text>this is first event</text></te0><te1><id>2</id><text>this is second event</text></te1><te0><id>3</id><text>this is third event</text></te0><te2><id>4</id><text>this is fourth event</text></te2></data>
All I want is to push to elasticsearch as seprate documents for id 1 and id 3 and remove anything else other than te0.
so kind out output should index two documents to elasticsearch. and below are the two docs:
doc 1 with id: 1 and doc as { "id":1, "text": "this is first event", "@timestamp":".....".......all metadata...}
doc 2 with id: 3 and doc as { "id":3, "text": "this is third event", "@timestamp":".....".......all metadata...}
I want only the members under xmldata with attribute te0, and ignore everything (can be multiple te1, te2 ,te3 and so on). and most of them are arrays (te*)
xml data will merge all te0s into one array and will have a single te0 key and first occurence in the xml of te0 will be 1st element in array and so on....Now, my split is creating documents with te0 [0] and the rest of the vars (te1, te2....), te0[1] and the rest of the vars(te1, te2....) and so no....
But I want is only te0[0] as one document without any other te*s, te[1] as another document.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.