How to store logs to json files that can be read in later

sanketshinde · June 12, 2018, 8:52pm

I am sourcing some documents from a remote cluster and storing them in files on a local machine. I would like to store the documents as a valid json that I can later read in, say for example, in python.

I used the file plugin which has a default json-lines as the codec. This, however, doesn't seem to do the job as it saves the documents in a '\n' delimited format which is not a valid json.

Has anyone faced this issue before? Any leads on this?

The desired format is:

[
doc1,
doc2
...
]
Thanks.

jakelandis · June 12, 2018, 9:26pm

This may be what you are wanting:

# bin/logstash-plugin install logstash-filter-uuid
filter {
   uuid {
     target   => "uuid"
   }
}
output {
	file {
	 path => "/tmp/%{uuid}.json"
	 codec => "json"
	}
}

...however if you have a lot of files, I would not recommend this, you can take out your OS or file system trying to add millions of little files to a single directory. You may want to consider an intermediary system such as mysql or kafka (or even Elasticsearch), or maybe a batch them together in a single file with a codec such as avro or protobuf (or even json lines) that your later application (say python) can decode.

sanketshinde · June 12, 2018, 9:29pm

I edited the post, that should make clear what I seek to achieve. Thanks for the prompt reply!

magnusbaeck · June 13, 2018, 6:14am

How would Logstash know when to write the final ]?

Any program that's capable of reading

[
{...},
{...}
...
]

would also be capable of reading this:

{...}
{...}
...

Depending on the size of the file the latter might also be more efficient.

sanketshinde · June 14, 2018, 1:38pm

Sure, I just kept it that way. Thanks for the replies.

system · July 12, 2018, 1:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.