I'm new to Elasticsearch 5.0 with Logstash 5.0 and I want to parse some bigger json file. I put an example on Json File.
What I try to do is split the json on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* . But I can't figure out how to split it how I can make wildcards work.
The Output I get hast the following issue:
18:14:29.208 [[main]>worker0] WARN logstash.filters.split - Only String and Array types are splittable. field:Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* is of type = NilClass
I don't understand what a split on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* even means. What does the input document look like and how do you want it transformed?
I can see in the pastebin text that the JSON is "pretty printed" over multiple lines.
Is your data like this? If so, this will not work with the JSON codec on the file input.
file + json codec want one JSON object per line. e.g.
{"foo":"bar"}
{"bar":"baz"}
for pretty printed files you will need the file multiline codec with a pattern => "^}" and what => "previous" and negate => true. Then the message field will have the JSON as a string and you will need to apply the JSON filter to the message field. But then the difficulty will kick in.
Logstash does not know how to descend into this huge JSON document and split all fields matching "Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.*" into separate events.
We don't have an equivalent concept of XPATH in the split filter.
Hi Magnus,
thanks for reply. I linked the Input File in my first post. It is a quite big json file so I posted it on pastbin (Link).
As you can see there are Objects called Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo and Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.XX (XX stands for a number). I want to split every object and relate the Aray FiletypeInfos to each object.
@guyboertje
This json is in a single line. Sorry for not mentioning that.
This Json File is generated by our own developed Software. So you think its better to split it into smaller pieces like:
As your own software is generating this - yes, I would generate one more focussed object per line.
Looking backwards, what data do you want to have in Elasticsearch as a the smallest document unit?
If you are able to output this as a JSON Object one per line - you will have reached the nirvana that every ETL developer dreams of . This means minimal handling in Logstash, in fact if you do not need any enhancement (the T in ETL) I would look at trialling the new IngestNode in ES 5 paired with filebeat.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.