After trying for a long time trying to get a nested JSON working I've just created a flat one - http://pastebin.com/0VVF4HNZ.
I feel like this should be trivial but trying input codec json, filter json, some attempts at multiline (all from these and stack overflow) has got me nowhere. I could just create a script to just use XPUT to work it's way down the file as that works, but this looks exactly what Logstash was designed to do .
I just get a string of errors like:
2016-11-09T16:36:47,737][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are suppoted>, :data=>"\t"EntityType": 3,"}
Thanks for the reply, I've modified the file and it works perfectly now.
I'm not creating it myself though, I'm pulling it from an API, so it'll be another script to write to get it all into one line. I guess since it's being sorted via the input there are no mutate tricks I can do as that's at the filter level?
If not I'll work with what I have and modify the log before Logstash gets hold of it. Thanks again.
That was what I was asking, there are so many features like the multiline input I was hoping I could use one of them. Doing the operation before the data comes in is fine though.
Cheers.
This might not be the right forum for this, but in case other people are trying to do the same as me.
I created a quick and dirty Python file that takes nested JSON and creates an output that input => {file{codec => json}} likes.
> import json
> from os import SEEK_END
> from flatten_json import flatten_json #need to get this prior
> with open('input.json') as data_file:
> data = json.load(data_file)
> x = flatten_json(data)
> f=open('output.json', 'wb')
> for i in range(0, 2000): #need to set this to the number of records (should be automated)
> f.write('{')
> for key in x:
> if key.startswith(('%s_') %i):
> strip = (str(i) + '_')
> newkey = key.lstrip(strip)
> f.write('"'+ newkey + '"' + ':' + '"'+ x[key] + '", ' )
> f.seek(-2, SEEK_END) #remove last comma
> f.truncate()
> f.write('}\n')
> f.seek(-1, SEEK_END) #remove last CR
> f.truncate()
I'm sure there's a better way of doing it, and it's 100% unsupported , but anyone else searching for a solution might find it workable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.