Logstash 5.0 parsing jason


(Fabian Geiger) #1

I'm new to Elasticsearch 5.0 with Logstash 5.0 and I want to parse some bigger json file. I put an example on Json File.

What I try to do is split the json on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* . But I can't figure out how to split it how I can make wildcards work.

My Configuration looks like that:

input {
file {
path => "/home/agenda/json/json.txt"
start_position => "beginning"
codec => json { }
sincedb_path => "/etc/logstash.central/jsonloc"
}
}

filter {
json {
source => "message"
}
split {
field => "Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.*"
}
}

The Output I get hast the following issue:
18:14:29.208 [[main]>worker0] WARN logstash.filters.split - Only String and Array types are splittable. field:Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* is of type = NilClass

Can someone help me?

Thanks


(Magnus B├Ąck) #2

I don't understand what a split on Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.* even means. What does the input document look like and how do you want it transformed?


(Guy Boertje) #3

I can see in the pastebin text that the JSON is "pretty printed" over multiple lines.

Is your data like this? If so, this will not work with the JSON codec on the file input.

file + json codec want one JSON object per line. e.g.

{"foo":"bar"}
{"bar":"baz"}

for pretty printed files you will need the file multiline codec with a pattern => "^}" and what => "previous" and negate => true. Then the message field will have the JSON as a string and you will need to apply the JSON filter to the message field. But then the difficulty will kick in.
Logstash does not know how to descend into this huge JSON document and split all fields matching "Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.*" into separate events.
We don't have an equivalent concept of XPATH in the split filter.


(Fabian Geiger) #4

Hi Magnus,
thanks for reply. I linked the Input File in my first post. It is a quite big json file so I posted it on pastbin (Link).
As you can see there are Objects called Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo and Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.XX (XX stands for a number). I want to split every object and relate the Aray FiletypeInfos to each object.

@guyboertje
This json is in a single line. Sorry for not mentioning that.
This Json File is generated by our own developed Software. So you think its better to split it into smaller pieces like:

{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis": {"SerialVersionUID": 59934019,"Timestamp": 42744.5537503009,"ElapsedTime": 1255,"RecordedFailures": [],"TotalFailureCount": 0,"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo": {"SerialVersionUID": 64013469,"DirCount": 1,"FiletypeInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo.FiletypeInfos.0": {"SerialVersionUID": 54176878,"FileExtension": ".chr","FileCount": 4,"Size": 107000}},{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.RootInfo.FiletypeInfos.1": {"SerialVersionUID": 54176878,"FileExtension": ".drv","FileCount": 3,"Size": 941536}}],"FileCount": 7,"Size": 1048536}}}
{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis": {"SerialVersionUID": 59934019,"Timestamp": 42744.5537503009,"ElapsedTime": 1255,"RecordedFailures": [],"TotalFailureCount": 0,"DirInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.0": {"SerialVersionUID": 64013469,"DirCount": 2,"FiletypeInfos": [{"Data.InstallAnalysis.Analysis.TCommServerInstallAnalysis.DirInfos.0.FiletypeInfos.0": {"SerialVersionUID": 54176878,"FileExtension": ".dll","FileCount": 119,"Size": 451199880}}],"FileCount": 119,"Size": 451199880}}]}}

(Guy Boertje) #5

To clarify, is each JSON Object is one "line" each in their own file or do you have multiple JSON Objects one per line in a file?


(Fabian Geiger) #6

Here is the unformatted input file: Link
The whole json document is in one line.


(Guy Boertje) #7

As your own software is generating this - yes, I would generate one more focussed object per line.

Looking backwards, what data do you want to have in Elasticsearch as a the smallest document unit?
If you are able to output this as a JSON Object one per line - you will have reached the nirvana that every ETL developer dreams of :slight_smile:. This means minimal handling in Logstash, in fact if you do not need any enhancement (the T in ETL) I would look at trialling the new IngestNode in ES 5 paired with filebeat.


(Guy Boertje) #8

On the other hand, you could develop your own plugin that chops up the one huge event into smaller events and cancels the large one.

Much of this discussion will revolve around:

  • whether you want your application to have the "chopping up" work load?
  • where do you want control of exactly what goes into Elasticsearch, your application or Logstash?
  • do you want some document to go to different ES indexes?
  • how often will the original JSON object structure change?
  • do you need to enhance the event before indexing to ES?

(Fabian Geiger) #9

Sorry for the late reply. We created a new JSON file an now I am able to split it how i want to. Thanks for your advices!


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.