Unable to ingest JSON file and output to stdout

New to Logstash, and I've attempted to start with a simple JSON file, though I can't get anything to output to stdout. It just hangs at Successfully started Logstash API endpoint {:port=>9646}

I've looked at a few posts to try to get some ideas, to no avail:

I've been able to get Logstash to work with the input plugins stdin and exec, but I really would like to get it to work with a .json file. Here's what I have:

test.json:

{
  "message": "test"
}

logstash.conf:

input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    start_position => "beginning"
  }
}

output {
  stdout {
    codec => "rubydebug"
  }
}

Output:

[2020-04-06T18:28:59,741][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/queue"}
[2020-04-06T18:28:59,872][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/dead_letter_queue"}
[2020-04-06T18:28:59,971][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-04-06T18:28:59,982][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.6.2"}
[2020-04-06T18:29:00,013][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"dbafbee1-c8e9-44bf-889c-0cc21921765b", :path=>"/Users/<project_path>/data/logstash/2020-04-06_22-28-40/uuid"}
[2020-04-06T18:29:01,781][INFO ][org.reflections.Reflections] Reflections took 39 ms to scan 1 urls, producing 20 keys and 40 values 
[2020-04-06T18:29:03,339][WARN ][org.logstash.instrument.metrics.gauge.LazyDelegatingGauge][main] A gauge metric of an unknown type (org.jruby.RubyArray) has been created for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[2020-04-06T18:29:03,400][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/Users/<project_path>/logstash.conf"], :thread=>"#<Thread:0x796af4b2 run>"}
[2020-04-06T18:29:04,480][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2020-04-06T18:29:04,542][INFO ][filewatch.observingtail  ][main] START, creating Discoverer, Watch with file and sincedb collections
[2020-04-06T18:29:04,556][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-04-06T18:29:04,971][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9646}

All-in-all, the file input plugin looks like it should be quite simple, but here I am. Any insight or advice would be greatly appreciated.

Hello @leosoaivan

I have few comments:

  1. The test.json file contains a multiline JSON. Logstash will be able to parse the file using the json codec if your file contains lines such as:

    { "message": "test" }
    { "message": "another test" }
    

    The format you're trying to parse is not so straightforward to split into valid JSON documents as there's not a clear separator (except if we assume the JSON will always start with a {\n and end up with a }\n without any other value. If it is the case, you might first use a multiline codec, followed by a json filter

  2. As explained in our documentation, Logstash keeps a sincedb file to keep track of the last line read. If you've run Logstash more than once, it is possible Logstash has already marked the lines as read. You can customize the sincedb_path or you can eventually remove the file (by default at <path.data>/plugins/inputs/file)

@Luca_Belluccini thanks for the quick reply.

I noted your first comment. Thank you for that tip.

As for sincedb_path, it appears that Logstash is able to generate one based on the path.data flag with which I am running in the CLI? I set path.data to a dynamically timestamped directory ($PWD/data/logstash/$TIMESTAMP), as I was running into the Logstash could not be started because there is already another instance using the configured data directory error.

In the end, I see the sincedb_path file being created, though it's empty.

I don't know if this is helpful, but the contents of my conf file are as follows, within a project:

#!/usr/bin/env bash

TIMESTAMP=`date -u +%Y-%m-%d_%H-%M-%S`

# LOG DIR DATA DIR
export DATA_DIR=$PWD/data/logstash/$TIMESTAMP
export LOG_DIR=$PWD/log/logstash/$TIMESTAMP

# Run Logstash
logstash -f $PWD/logstash.conf --path.data $DATA_DIR --path.logs $LOG_DIR

Hello @leosoaivan

I would discourage running changing the DATA_DIR and LOG_DIR each time.

By default, the sincedb file is located inside the path.data, but I would suggest specifying the location in the file input (doc).

It is useful to customize such parameters (path.data and path.logs) if you're running multiple Logstash instances within the same host.

input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    sincedb_path => "/some/location/"
    start_position => "beginning"
  }
}

@Luca_Belluccini, thanks for you input.

I've removed the timestamped path.data and I've revised the input as follows:

 input {
  file {
    codec => json
    path => ["/db/seed/test.json"]
    sincedb_path => "logstash/data/plugins/inputs/file/sincedb_file"
    start_position => "beginning"
  }
}

Unfortunately, the results are still the same. I've also added --log.level=debug and am seeing this re-occurring repeatedly after the Logstash API endpoint has started:

[2020-04-07T16:17:31,712][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2020-04-07T16:17:31,917][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2020-04-07T16:17:31,920][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2020-04-07T16:17:35,855][DEBUG][org.logstash.execution.PeriodicFlush][main] Pushing flush onto pipeline.

The logs you've shared are normal, they are triggered by the internal collection of monitoring information.

For the sincedb_path, use an absolute path (or verify the file has been created).
Also ensure path is a full path to the file.

It is possible to use Env variables in the pipeline file if you wish (doc).

For the sake of debugging with more ease, remove the codec => json.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.