I am sourcing some documents from a remote cluster and storing them in files on a local machine. I would like to store the documents as a valid json that I can later read in, say for example, in python.
I used the file plugin which has a default json-lines as the codec. This, however, doesn't seem to do the job as it saves the documents in a '\n' delimited format which is not a valid json.
Has anyone faced this issue before? Any leads on this?
The desired format is:
[
doc1,
doc2
...
]
Thanks.
This may be what you are wanting:
# bin/logstash-plugin install logstash-filter-uuid
filter {
uuid {
target => "uuid"
}
}
output {
file {
path => "/tmp/%{uuid}.json"
codec => "json"
}
}
...however if you have a lot of files, I would not recommend this, you can take out your OS or file system trying to add millions of little files to a single directory. You may want to consider an intermediary system such as mysql or kafka (or even Elasticsearch), or maybe a batch them together in a single file with a codec such as avro or protobuf (or even json lines) that your later application (say python) can decode.
I edited the post, that should make clear what I seek to achieve. Thanks for the prompt reply!
How would Logstash know when to write the final ]?
Any program that's capable of reading
[
{...},
{...}
...
]
would also be capable of reading this:
{...}
{...}
...
Depending on the size of the file the latter might also be more efficient.
Sure, I just kept it that way. Thanks for the replies.