Logstash 7.8 doesn't re-parse existing file

Alex_Liu · July 10, 2020, 9:54am

Hi team,

I'm using Elastic Stack 7.8, below logstash doesn't read old files in /usr/data/nsg/ folder.

input {
   file
   {
      path => "/usr/data/nsg/*"
      start_position => "beginning"
      codec => "json"
      sincedb_path => "/dev/null"
      ignore_older => 0
 }
}
output {
  stdout {
   codec => rubydebug
  }
}

Here is the debug log below:

    [DEBUG] 2020-07-10 09:49:14.186 [Converge PipelineAction::Create<main>] json - config LogStash::Codecs::JSON/@id = "json_82dfaaf6-5a13-49ce-8d4b-10608037a9ef"
    [DEBUG] 2020-07-10 09:49:14.189 [Converge PipelineAction::Create<main>] json - config LogStash::Codecs::JSON/@enable_metric = true
    [DEBUG] 2020-07-10 09:49:14.191 [Converge PipelineAction::Create<main>] json - config LogStash::Codecs::JSON/@charset = "UTF-8"
    [DEBUG] 2020-07-10 09:49:14.232 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@start_position = "beginning"
    [DEBUG] 2020-07-10 09:49:14.232 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@path = ["/usr/data/nsg/*"]
    [DEBUG] 2020-07-10 09:49:14.240 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@codec = <LogStash::Codecs::JSON id=>"json_82dfaaf6-5a13-49ce-8d4b-10608037a9ef", enable_metric=>true, charset=>"UTF-8">
    [DEBUG] 2020-07-10 09:49:14.240 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@id = "d3310e797726d05295337b7ef7997957226f86065ef90cb6324230db58e707fa"
    [DEBUG] 2020-07-10 09:49:14.240 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@sincedb_path = "/dev/null"
    [DEBUG] 2020-07-10 09:49:14.241 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@ignore_older = 0.0
    [DEBUG] 2020-07-10 09:49:14.241 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@enable_metric = true
    [DEBUG] 2020-07-10 09:49:14.241 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@add_field = {}
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@stat_interval = 1.0
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@discover_interval = 15
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@sincedb_write_interval = 15.0
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@delimiter = "\n"
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@close_older = 3600.0
    [DEBUG] 2020-07-10 09:49:14.242 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@mode = "tail"
    [DEBUG] 2020-07-10 09:49:14.246 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@file_completed_action = "delete"
    [DEBUG] 2020-07-10 09:49:14.246 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@sincedb_clean_after = 1209600.0
    [DEBUG] 2020-07-10 09:49:14.246 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@file_chunk_size = 32768
    [DEBUG] 2020-07-10 09:49:14.246 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@file_chunk_count = 140737488355327
    [DEBUG] 2020-07-10 09:49:14.247 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@file_sort_by = "last_modified"
    [DEBUG] 2020-07-10 09:49:14.247 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@file_sort_direction = "asc"
    [DEBUG] 2020-07-10 09:49:14.247 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@exit_after_read = false
    [DEBUG] 2020-07-10 09:49:14.247 [Converge PipelineAction::Create<main>] file - config LogStash::Inputs::File/@check_archive_validity = false
    [DEBUG] 2020-07-10 09:49:14.285 [Converge PipelineAction::Create<main>] registry - On demand adding plugin to the registry {:name=>"stdout", :type=>"output", :class=>LogStash::Outputs::Stdout}
    [DEBUG] 2020-07-10 09:49:14.307 [Converge PipelineAction::Create<main>] registry - On demand adding plugin to the registry {:name=>"rubydebug", :type=>"codec", :class=>LogStash::Codecs::RubyDebug}
    [DEBUG] 2020-07-10 09:49:14.313 [Converge PipelineAction::Create<main>] rubydebug - config LogStash::Codecs::RubyDebug/@id = "rubydebug_3dd38c72-ab89-4286-9fea-9ae11d547fcf"
    [DEBUG] 2020-07-10 09:49:14.313 [Converge PipelineAction::Create<main>] rubydebug - config LogStash::Codecs::RubyDebug/@enable_metric = true
    [DEBUG] 2020-07-10 09:49:14.313 [Converge PipelineAction::Create<main>] rubydebug - config LogStash::Codecs::RubyDebug/@metadata = false
    [DEBUG] 2020-07-10 09:49:14.395 [Converge PipelineAction::Create<main>] stdout - config LogStash::Outputs::Stdout/@codec = <LogStash::Codecs::RubyDebug id=>"rubydebug_3dd38c72-ab89-4286-9fea-9ae11d547fcf", enable_metric=>true, metadata=>false>
    [DEBUG] 2020-07-10 09:49:14.397 [Converge PipelineAction::Create<main>] stdout - config LogStash::Outputs::Stdout/@id = "8ba6fd2959e72b372718339ec5c263f798ba81a3125c21b28f58c18a8d164272"
    [DEBUG] 2020-07-10 09:49:14.397 [Converge PipelineAction::Create<main>] stdout - config LogStash::Outputs::Stdout/@enable_metric = true
    [DEBUG] 2020-07-10 09:49:14.398 [Converge PipelineAction::Create<main>] stdout - config LogStash::Outputs::Stdout/@workers = 1
    [DEBUG] 2020-07-10 09:49:14.439 [Converge PipelineAction::Create<main>] javapipeline - Starting pipeline {:pipeline_id=>"main"}
    [INFO ] 2020-07-10 09:49:14.539 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/usr/share/logstash/bin/tlocal.conf"], :thread=>"#<Thread:0x2a625cbd run>"}
    [DEBUG] 2020-07-10 09:49:15.309 [Ruby-0-Thread-5: :1] CompiledPipeline - Compiled output
     P[output-stdout{"codec"=>"rubydebug"}|[file]/usr/share/logstash/bin/tlocal.conf:12:3:```
    stdout {
       codec => rubydebug
      }
    ```]
     into
     org.logstash.config.ir.compiler.ComputeStepSyntaxElement@3a13aaf1
    [DEBUG] 2020-07-10 09:49:15.319 [Ruby-0-Thread-6: :1] CompiledPipeline - Compiled output
     P[output-stdout{"codec"=>"rubydebug"}|[file]/usr/share/logstash/bin/tlocal.conf:12:3:```
    stdout {
       codec => rubydebug
      }
    ```]
     into
     org.logstash.config.ir.compiler.ComputeStepSyntaxElement@3a13aaf1
    [INFO ] 2020-07-10 09:49:15.680 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
    [DEBUG] 2020-07-10 09:49:15.689 [Converge PipelineAction::Create<main>] javapipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x2a625cbd run>"}
    [DEBUG] 2020-07-10 09:49:15.717 [logstash-pipeline-flush] PeriodicFlush - Pushing flush onto pipeline.
    [INFO ] 2020-07-10 09:49:15.748 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
    [DEBUG] 2020-07-10 09:49:15.804 [Api Webserver] agent - Starting puma
    [DEBUG] 2020-07-10 09:49:15.814 [Api Webserver] agent - Trying to start WebServer {:port=>9600}
    [INFO ] 2020-07-10 09:49:15.815 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
    [DEBUG] 2020-07-10 09:49:15.897 [Api Webserver] service - [api-service] start
    [INFO ] 2020-07-10 09:49:16.100 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
    [DEBUG] 2020-07-10 09:49:17.618 [pool-3-thread-1] jvm - collector name {:name=>"ParNew"}

Badger · July 12, 2020, 11:22pm

In filebeat, setting ignore_older to zero tells filebeat not to filter files based on age.

In a logstash file input, setting ignore_older to zero tells the input to ignore any files more than zero seconds old, which is every file.

Alex_Liu · July 13, 2020, 1:35am

Thanks for your reply.

I tried to remove the "ignore_older => 0", but Logstash 7.8 still didn't read the existing files in "/usr/data/nsg/". Here is the new debug log.

Reflections - Reflections took 45 ms to scan 1 urls, producing 21 keys and 41 values
Reflections - expanded subtype co.elastic.logstash.api.Plugin -> co.elastic.logstash.api.Input
Reflections - expanded subtype co.elastic.logstash.api.Plugin -> co.elastic.logstash.api.Codec
Reflections - expanded subtype org.jruby.RubyBasicObject -> org.jruby.RubyObject
Reflections - expanded subtype java.lang.Cloneable -> org.jruby.RubyBasicObject
Reflections - expanded subtype org.jruby.runtime.builtin.IRubyObject -> org.jruby.RubyBasicObject
Reflections - expanded subtype java.io.Serializable -> org.jruby.RubyBasicObject
Reflections - expanded subtype java.lang.Comparable -> org.jruby.RubyBasicObject
Reflections - expanded subtype org.jruby.runtime.marshal.CoreObjectType -> org.jruby.RubyBasicObject
Reflections - expanded subtype org.jruby.runtime.builtin.InstanceVariables -> org.jruby.RubyBasicObject
Reflections - expanded subtype org.jruby.runtime.builtin.InternalVariables -> org.jruby.RubyBasicObject
Reflections - expanded subtype co.elastic.logstash.api.Plugin -> co.elastic.logstash.api.Output
Reflections - expanded subtype co.elastic.logstash.api.Metric -> co.elastic.logstash.api.NamespacedMetric
Reflections - expanded subtype java.security.SecureClassLoader -> java.net.URLClassLoader
Reflections - expanded subtype java.lang.ClassLoader -> java.security.SecureClassLoader
Reflections - expanded subtype java.io.Closeable -> java.net.URLClassLoader
Reflections - expanded subtype java.lang.AutoCloseable -> java.io.Closeable
Reflections - expanded subtype java.lang.Comparable -> java.lang.Enum
Reflections - expanded subtype java.io.Serializable -> java.lang.Enum
Reflections - expanded subtype co.elastic.logstash.api.Plugin -> co.elastic.logstash.api.Filter
registry - On demand adding plugin to the registry {:name=>"file", :type=>"input", :class=>LogStash::Inputs::File}
registry - On demand adding plugin to the registry {:name=>"json", :type=>"codec", :class=>LogStash::Codecs::JSON}
json - config LogStash::Codecs::JSON/@id = "json_2ff734cf-d281-4a95-a0d9-d8cbc545b730"
json - config LogStash::Codecs::JSON/@enable_metric = true
json - config LogStash::Codecs::JSON/@charset = "UTF-8"
file - config LogStash::Inputs::File/@start_position = "beginning"
file - config LogStash::Inputs::File/@path = ["/usr/data/nsg/*"]
file - config LogStash::Inputs::File/@codec = <LogStash::Codecs::JSON id=>"json_2ff734cf-d281-4a95-a0d9-d8cbc545b730", enable_metric=>tru
file - config LogStash::Inputs::File/@id = "72d69bed517c441e5624a174d4ed2637f7365a0209e521ed2e55963ef165a5ca"
file - config LogStash::Inputs::File/@sincedb_path = "/dev/null"
file - config LogStash::Inputs::File/@enable_metric = true
file - config LogStash::Inputs::File/@add_field = {}
file - config LogStash::Inputs::File/@stat_interval = 1.0
file - config LogStash::Inputs::File/@discover_interval = 15
file - config LogStash::Inputs::File/@sincedb_write_interval = 15.0
file - config LogStash::Inputs::File/@delimiter = "\n"
file - config LogStash::Inputs::File/@close_older = 3600.0
file - config LogStash::Inputs::File/@mode = "tail"
file - config LogStash::Inputs::File/@file_completed_action = "delete"
file - config LogStash::Inputs::File/@sincedb_clean_after = 1209600.0
file - config LogStash::Inputs::File/@file_chunk_size = 32768
file - config LogStash::Inputs::File/@file_chunk_count = 140737488355327
file - config LogStash::Inputs::File/@file_sort_by = "last_modified"
file - config LogStash::Inputs::File/@file_sort_direction = "asc"
file - config LogStash::Inputs::File/@exit_after_read = false
file - config LogStash::Inputs::File/@check_archive_validity = false
registry - On demand adding plugin to the registry {:name=>"stdout", :type=>"output", :class=>LogStash::Outputs::Stdout}
registry - On demand adding plugin to the registry {:name=>"rubydebug", :type=>"codec", :class=>LogStash::Codecs::RubyDebug}
rubydebug - config LogStash::Codecs::RubyDebug/@id = "rubydebug_ee0ca518-1668-4072-ab50-bd2142b5d3a1"
rubydebug - config LogStash::Codecs::RubyDebug/@enable_metric = true
rubydebug - config LogStash::Codecs::RubyDebug/@metadata = false
stdout - config LogStash::Outputs::Stdout/@codec = <LogStash::Codecs::RubyDebug id=>"rubydebug_ee0ca518-1668-4072-ab50-bd2142b5d3a1", ena
stdout - config LogStash::Outputs::Stdout/@id = "47bb3b6796623d1e55a7c65004685a3c476a1173f64251c7cb84467ce5be4123"
stdout - config LogStash::Outputs::Stdout/@enable_metric = true
stdout - config LogStash::Outputs::Stdout/@workers = 1
javapipeline - Starting pipeline {:pipeline_id=>"main"}

Badger · July 13, 2020, 4:35pm

Enable trace level logging. You will get a bunch of message that should tell you what it is doing. Stuff like

[2019-01-10T16:39:51,066][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 85, path: foo.txt
[2019-01-10T16:39:51,093][TRACE][filewatch.tailmode.handlers.createinitial] add_new_value_sincedb_collection {"position"=>0, "watched_file details"=>"<FileWatch::Watched File: @filename='foo.txt', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='954', current_size='954', last_stat_size='954', file_open?='true', @initial=true, @sincedb_key='59521 0 51714'>"}
[2019-01-10T16:39:51,130][TRACE][filewatch.tailmode.processor] Active - file grew: foo.txt: new size is 954, bytes read 0

Alex_Liu · July 14, 2020, 6:28am

Thanks Badger.

The Trace Log File part:

file - config LogStash::Inputs::File/@start_position = "beginning"
file - config LogStash::Inputs::File/@path = ["/usr/data/nsg/*"]
file - config LogStash::Inputs::File/@codec = <LogStash::Codecs::JSON id=>"json_3cfcc2b3-27ba-4b9a-a84c-63d28e083233", enable_metric=>tr
file - config LogStash::Inputs::File/@id = "20f0cdfa17e0faea50ad3ab101628fef32ea94a8852f643b6955d66b7d7954e4"
file - config LogStash::Inputs::File/@sincedb_path = "/dev/null"
file - config LogStash::Inputs::File/@enable_metric = true
file - config LogStash::Inputs::File/@add_field = {}
file - config LogStash::Inputs::File/@stat_interval = 1.0
file - config LogStash::Inputs::File/@discover_interval = 15
file - config LogStash::Inputs::File/@sincedb_write_interval = 15.0
file - config LogStash::Inputs::File/@delimiter = "\n"
file - config LogStash::Inputs::File/@close_older = 3600.0
file - config LogStash::Inputs::File/@mode = "tail"
file - config LogStash::Inputs::File/@file_completed_action = "delete"
file - config LogStash::Inputs::File/@sincedb_clean_after = 1209600.0
file - config LogStash::Inputs::File/@file_chunk_size = 32768
file - config LogStash::Inputs::File/@file_chunk_count = 140737488355327
file - config LogStash::Inputs::File/@file_sort_by = "last_modified"
file - config LogStash::Inputs::File/@file_sort_direction = "asc"
file - config LogStash::Inputs::File/@exit_after_read = false
file - config LogStash::Inputs::File/@check_archive_validity = false

The Trace Log read file part

[TRACE] 2020-07-14 06:12:42.258 [[main]<file] createinitial - handling: PT1H.json
[TRACE] 2020-07-14 06:12:42.276 [[main]<file] createinitial - opening PT1H.json
[TRACE] 2020-07-14 06:12:42.292 [[main]<file] createinitial - handle_specifically opened file handle: 85, path: PT1H.json
[TRACE] 2020-07-14 06:12:42.340 [[main]<file] createinitial - add_new_value_sincedb_collection {"position"=>0, "watched_file details"=>"<FileWatch::WatchedFile: @filename='PT1H.json', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='141968', current_size='141968', last_stat_size='141968', file_open?='true', @initial=true, @sincedb_key='790986 0 2065'>"}
[TRACE] 2020-07-14 06:12:42.425 [[main]<file] processor - Active - file grew: PT1H.json: new size is 141968, bytes read 0
[TRACE] 2020-07-14 06:12:42.428 [[main]<file] grow - handling: PT1H.json
[TRACE] 2020-07-14 06:12:42.444 [[main]<file] grow - reading... {"iterations"=>4, "amount"=>32768, "filename"=>"PT1H.json"}
[DEBUG] 2020-07-14 06:12:42.445 [[main]<file] grow - read_to_eof: get chunk
[TRACE] 2020-07-14 06:12:42.473 [[main]<file] grow - buffer_extract: a delimiter can't be found in current chunk, maybe there are no more delimiters or the delimiter is incorrect or the text before the delimiter, a 'line', is very large, if this message is logged often try increasing the `file_chunk_size` setting. {"delimiter"=>"\n", "read_position"=>0, "bytes_read_count"=>32768, "last_known_file_size"=>141968, "file_path"=>"/usr/data/nsg/PT1H.json"}

The full conf:

input {
   file
   {
      path => "/usr/data/nsg/*"
      start_position => "beginning"
      codec => "json"
      sincedb_path => "/dev/null"
 }
}
output {
  stdout {
   codec => rubydebug
  }
}

Badger · July 14, 2020, 4:33pm

A file input reads lines from a file. It did not find a line ending in the first 32768 bytes. If the file does not contain a newline the file input cannot read it. If there are lines in the file longer than 32768 bytes then increase file_chunk_size.

Alex_Liu · July 16, 2020, 8:10am

Thanks Badger.

My JSON file is like below, seems it can't be parsed to JSON by file plugin. For the multi line json document in xxx.json file, is there any idea?

{
  "array": [
    1,
    2,
    3
  ],
  "boolean": true,
  "color": "gold",
  "null": null,
  "number": 123,
  "object": {
    "a": "b",
    "c": "d"
  },
  "string": "Hello World"
}

Alex_Liu · July 21, 2020, 3:34am

Is there anyone know the answer?

system · August 18, 2020, 3:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File input doesn t read incoming Json file from web application Logstash	5	822	May 1, 2019
Logstash doesn't parse new files in directory Logstash	5	137	November 14, 2023
File input not picking up copied or moved files Logstash	4	2062	July 6, 2017
Logstash trying to reprocess the same files Logstash	2	209	November 22, 2022
File input doesn t read incoming Json file from web application but when modified everythnig is fine Logstash	1	185	July 27, 2021

Logstash 7.8 doesn't re-parse existing file

Related Topics