Unable to ingest JSON file and output to stdout

leosoaivan · April 6, 2020, 10:44pm

New to Logstash, and I've attempted to start with a simple JSON file, though I can't get anything to output to stdout. It just hangs at Successfully started Logstash API endpoint {:port=>9646}

I've looked at a few posts to try to get some ideas, to no avail:

I've been able to get Logstash to work with the input plugins stdin and exec, but I really would like to get it to work with a .json file. Here's what I have:

test.json:

{
  "message": "test"
}

logstash.conf:

input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    start_position => "beginning"
  }
}

output {
  stdout {
    codec => "rubydebug"
  }
}

Output:

[2020-04-06T18:28:59,741][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/queue"}
[2020-04-06T18:28:59,872][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/dead_letter_queue"}
[2020-04-06T18:28:59,971][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-04-06T18:28:59,982][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.6.2"}
[2020-04-06T18:29:00,013][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"dbafbee1-c8e9-44bf-889c-0cc21921765b", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/uuid"}
[2020-04-06T18:29:01,781][INFO ][org.reflections.Reflections] Reflections took 39 ms to scan 1 urls, producing 20 keys and 40 values 
[2020-04-06T18:29:03,339][WARN ][org.logstash.instrument.metrics.gauge.LazyDelegatingGauge][main] A gauge metric of an unknown type (org.jruby.RubyArray) has been created for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[2020-04-06T18:29:03,400][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/Users/<project_path>/logstash.conf"], :thread=>"#<Thread:0x796af4b2 run>"}
[2020-04-06T18:29:04,480][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2020-04-06T18:29:04,542][INFO ][filewatch.observingtail  ][main] START, creating Discoverer, Watch with file and sincedb collections
[2020-04-06T18:29:04,556][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-04-06T18:29:04,971][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9646}

All-in-all, the file input plugin looks like it should be quite simple, but here I am. Any insight or advice would be greatly appreciated.

Luca_Belluccini · April 6, 2020, 11:10pm

Hello @leosoaivan

I have few comments:

The test.json file contains a multiline JSON. Logstash will be able to parse the file using the json codec if your file contains lines such as:
```
{ "message": "test" }
{ "message": "another test" }
```
The format you're trying to parse is not so straightforward to split into valid JSON documents as there's not a clear separator (except if we assume the JSON will always start with a {\n and end up with a }\n without any other value. If it is the case, you might first use a multiline codec, followed by a json filter
As explained in our documentation, Logstash keeps a sincedb file to keep track of the last line read. If you've run Logstash more than once, it is possible Logstash has already marked the lines as read. You can customize the sincedb_path or you can eventually remove the file (by default at <path.data>/plugins/inputs/file)

leosoaivan · April 7, 2020, 2:36pm

@Luca_Belluccini thanks for the quick reply.

I noted your first comment. Thank you for that tip.

As for sincedb_path, it appears that Logstash is able to generate one based on the path.data flag with which I am running in the CLI? I set path.data to a dynamically timestamped directory ($PWD/data/logstash/$TIMESTAMP), as I was running into the Logstash could not be started because there is already another instance using the configured data directory error.

In the end, I see the sincedb_path file being created, though it's empty.

I don't know if this is helpful, but the contents of my conf file are as follows, within a project:

#!/usr/bin/env bash

TIMESTAMP=`date -u +%Y-%m-%d_%H-%M-%S`

# LOG DIR DATA DIR
export DATA_DIR=$PWD/data/logstash/$TIMESTAMP
export LOG_DIR=$PWD/log/logstash/$TIMESTAMP

# Run Logstash
logstash -f $PWD/logstash.conf --path.data $DATA_DIR --path.logs $LOG_DIR

Luca_Belluccini · April 7, 2020, 5:39pm

Hello @leosoaivan

I would discourage running changing the DATA_DIR and LOG_DIR each time.

By default, the sincedb file is located inside the path.data, but I would suggest specifying the location in the file input (doc).

It is useful to customize such parameters (path.data and path.logs) if you're running multiple Logstash instances within the same host.

input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    sincedb_path => "/some/location/"
    start_position => "beginning"
  }
}

leosoaivan · April 7, 2020, 8:23pm

@Luca_Belluccini, thanks for you input.

I've removed the timestamped path.data and I've revised the input as follows:

 input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    sincedb_path => "logstash/data/plugins/inputs/file/sincedb_file"
    start_position => "beginning"
  }
}

Unfortunately, the results are still the same. I've also added --log.level=debug and am seeing this re-occurring repeatedly after the Logstash API endpoint has started:

[2020-04-07T16:17:31,712][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2020-04-07T16:17:31,917][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2020-04-07T16:17:31,920][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2020-04-07T16:17:35,855][DEBUG][org.logstash.execution.PeriodicFlush][main] Pushing flush onto pipeline.

Luca_Belluccini · April 7, 2020, 8:44pm

The logs you've shared are normal, they are triggered by the internal collection of monitoring information.

For the sincedb_path, use an absolute path (or verify the file has been created).
Also ensure path is a full path to the file.

It is possible to use Env variables in the pipeline file if you wish (doc).

For the sake of debugging with more ease, remove the codec => json.

system · May 5, 2020, 8:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read JSON array file Logstash	3	530	December 2, 2019
Pipeline starts but data does not ingest - json or xml files Logstash	9	2632	November 3, 2017
Logstash is unable to process json file Logstash	7	1676	June 3, 2019
No output from json line delimited file Logstash	3	479	April 5, 2022
JSON File Input Logstash	7	15433	October 27, 2017

Unable to ingest JSON file and output to stdout

Related topics