Logstash 5.0 parsing jason

fabian · January 24, 2017, 5:15pm

I'm new to Elasticsearch 5.0 with Logstash 5.0 and I want to parse some bigger json file. I put an example on Json File.

What I try to do is split the json on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* . But I can't figure out how to split it how I can make wildcards work.

My Configuration looks like that:

input {
file {
path => "/home/agenda/json/json.txt"
start_position => "beginning"
codec => json { }
sincedb_path => "/etc/logstash.central/jsonloc"
}
}

filter {
json {
source => "message"
}
split {
field => "Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.*"
}
}

The Output I get hast the following issue:
18:14:29.208 [[main]>worker0] WARN logstash.filters.split - Only String and Array types are splittable. field:Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* is of type = NilClass

Can someone help me?

Thanks

magnusbaeck · January 24, 2017, 8:11pm

I don't understand what a split on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* even means. What does the input document look like and how do you want it transformed?

guyboertje · January 25, 2017, 9:14am

I can see in the pastebin text that the JSON is "pretty printed" over multiple lines.

Is your data like this? If so, this will not work with the JSON codec on the file input.

file + json codec want one JSON object per line. e.g.

{"foo":"bar"}
{"bar":"baz"}

for pretty printed files you will need the file multiline codec with a pattern => "^}" and what => "previous" and negate => true. Then the message field will have the JSON as a string and you will need to apply the JSON filter to the message field. But then the difficulty will kick in.
Logstash does not know how to descend into this huge JSON document and split all fields matching "Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.*" into separate events.
We don't have an equivalent concept of XPATH in the split filter.

fabian · January 25, 2017, 11:55am

Hi Magnus,
thanks for reply. I linked the Input File in my first post. It is a quite big json file so I posted it on pastbin (Link).
As you can see there are Objects called Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo and Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.XX (XX stands for a number). I want to split every object and relate the Aray FiletypeInfos to each object.

@guyboertje
This json is in a single line. Sorry for not mentioning that.
This Json File is generated by our own developed Software. So you think its better to split it into smaller pieces like:

{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis": {"SerialVersionUID": 59934019,"Timestamp": 42744.5537503009,"ElapsedTime": 1255,"RecordedFailures": [],"TotalFailureCount": 0,"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo": {"SerialVersionUID": 64013469,"DirCount": 1,"FiletypeInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo.FiletypeInfos.0": {"SerialVersionUID": 54176878,"FileExtension": ".chr","FileCount": 4,"Size": 107000}},{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo.FiletypeInfos.1": {"SerialVersionUID": 54176878,"FileExtension": ".drv","FileCount": 3,"Size": 941536}}],"FileCount": 7,"Size": 1048536}}}
{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis": {"SerialVersionUID": 59934019,"Timestamp": 42744.5537503009,"ElapsedTime": 1255,"RecordedFailures": [],"TotalFailureCount": 0,"DirInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.0": {"SerialVersionUID": 64013469,"DirCount": 2,"FiletypeInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.0.FiletypeInfos.0": {"SerialVersionUID": 54176878,"FileExtension": ".dll","FileCount": 119,"Size": 451199880}}],"FileCount": 119,"Size": 451199880}}]}}

guyboertje · January 25, 2017, 11:57am

To clarify, is each JSON Object is one "line" each in their own file or do you have multiple JSON Objects one per line in a file?

fabian · January 25, 2017, 12:01pm

Here is the unformatted input file: Link
The whole json document is in one line.

guyboertje · January 25, 2017, 1:48pm

As your own software is generating this - yes, I would generate one more focussed object per line.

Looking backwards, what data do you want to have in Elasticsearch as a the smallest document unit?
If you are able to output this as a JSON Object one per line - you will have reached the nirvana that every ETL developer dreams of . This means minimal handling in Logstash, in fact if you do not need any enhancement (the T in ETL) I would look at trialling the new IngestNode in ES 5 paired with filebeat.

guyboertje · January 25, 2017, 2:04pm

On the other hand, you could develop your own plugin that chops up the one huge event into smaller events and cancels the large one.

Much of this discussion will revolve around:

whether you want your application to have the "chopping up" work load?
where do you want control of exactly what goes into Elasticsearch, your application or Logstash?
do you want some document to go to different ES indexes?
how often will the original JSON object structure change?
do you need to enhance the event before indexing to ES?

fabian · February 15, 2017, 12:57pm

Sorry for the late reply. We created a new JSON file an now I am able to split it how i want to. Thanks for your advices!

system · March 15, 2017, 12:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.