Logstash wildcards / regex not working

So, it looks to be throwing in another forward slash within the regex statement.

Anyway around that? Must be missing something.

[2019-01-14T16:40:28,141][DEBUG][logstash.inputs.file     ] config LogStash::Inputs::File/@start_position = "beginning"
[2019-01-14T16:40:28,142][DEBUG][logstash.inputs.file     ] config LogStash::Inputs::File/@path = ["C:/Logs/20190107/icnserver12/SystemOut.*\\.log"]

If your config contains

path -> "/some/path/.*\.log"

then the log message will escape the backslash

 config LogStash::Inputs::File/@path = ["/some/path/.*\\.log"]

It is not the regex style you are thinking it is. Your pattern should be

path -> "/some/path/*.log"

Then you should get these messages in the log

[2019-01-14T18:23:43,939][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2019-01-14T18:23:43,942][TRACE][filewatch.discoverer     ] discover_files {"count"=>1}
[2019-01-14T18:23:44,084][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='foo.csv', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='14', last_stat_size='14', file_open?='false', @initial=true, @sincedb_key='53823 0 51714'>"}

Changed it and it finds nothing within the directory...which is odd as there are 5 files sitting there.

I have (previously) tried every iteration in that path with wildcards in various places.

I don't think that I've ever had success getting this working properly.

Thoughts?

Is there a way to further debug 'why' it isn't picking up files from that directory? If I change the parameter to the literal 'SystemOut.log' it processes just fine.

Line 6994: [2019-01-14T17:41:51,174][DEBUG][logstash.inputs.file     ] config LogStash::Inputs::File/@path = ["C:/Logs/20190107/icnserver12/*.log"]
Line 6995: [2019-01-14T17:41:51,201][DEBUG][logstash.inputs.file     ] config LogStash::Inputs::File/@codec = <LogStash::Codecs::Multiline pattern=>"^\\[", charset=>"ISO-8859-1", what=>"previous", id=>"75ba5c03-c41c-42bc-abb5-eac2b3345b45", negate=>true, enable_metric=>true, multiline_tag=>"multiline", max_lines=>500, max_bytes=>10485760>
Line 6996: [2019-01-14T17:41:51,206][DEBUG][logstash.inputs.file     ] config LogStash::Inputs::File/@id = "23628c8d93ba36bc12755df47ca48837d2354369567437d23ed175bd6c437a3c"

Enable trace level logging and search for filewatch in the logfile.

So, of interest it does find the files in the directory (discover_files {"count"=>2}) yet it doesn't process them. Can't determine why just yet. Maybe the trace will shed some additional light.

[2019-01-14T19:26:26,675][TRACE][filewatch.discoverer     ] discover_files {"count"=>2}
[2019-01-14T19:26:26,833][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut.log', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='14081405', last_stat_size='14081405', file_open?='false', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:26,839][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190108/icnserver12/SystemOut.log"}
[2019-01-14T19:26:26,848][TRACE][filewatch.sincedbcollection] associate: unmatched
[2019-01-14T19:26:26,873][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='26385775', last_stat_size='26385775', file_open?='false', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:26,874][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190108/icnserver12/SystemOut_19.01.07_11.33.45.log"}
[2019-01-14T19:26:26,875][TRACE][filewatch.sincedbcollection] associate: unmatched
[2019-01-14T19:26:26,951][TRACE][filewatch.tailmode.processor] Delayed Delete processing
[2019-01-14T19:26:26,964][TRACE][filewatch.tailmode.processor] Watched + Active restat processing
[2019-01-14T19:26:27,006][TRACE][filewatch.tailmode.processor] Rotation In Progress processing
[2019-01-14T19:26:27,018][TRACE][filewatch.tailmode.processor] Watched processing
[2019-01-14T19:26:27,036][TRACE][filewatch.tailmode.handlers.createinitial] handling: SystemOut.log
[2019-01-14T19:26:27,055][TRACE][filewatch.tailmode.handlers.createinitial] opening SystemOut.log
[2019-01-14T19:26:27,079][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 100750, path: SystemOut.log
[2019-01-14T19:26:27,084][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-14T19:26:27,118][TRACE][filewatch.tailmode.handlers.createinitial] add_new_value_sincedb_collection {"position"=>0, "watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut.log', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='14081405', current_size='14081405', last_stat_size='14081405', file_open?='true', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:27,127][TRACE][filewatch.tailmode.handlers.createinitial] handling: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,130][TRACE][filewatch.tailmode.handlers.createinitial] opening SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,138][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 100751, path: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,141][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: found sincedb record {"sincedb key"=>#<struct FileWatch::InodeStruct inode="unknown", maj=0, min=0>, "sincedb value"=>#<FileWatch::SincedbValue:0xb5d09d @last_changed_at=1547515587.112, @watched_file="<FileWatch::WatchedFile: @filename='SystemOut.log', @state='active', @sincedb_key='unknown 0 0, size=14081405>", @position=0>}
[2019-01-14T19:26:27,150][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: the found sincedb_value has a watched_file - this is a rename, switching inode to this watched file
[2019-01-14T19:26:27,154][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: switching from... {"watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='26385775', current_size='26385775', last_stat_size='26385775', file_open?='true', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:28,226][TRACE][filewatch.tailmode.handlers.unignore] handling: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:28,235][TRACE][filewatch.tailmode.handlers.unignore] update_existing_sincedb_collection_value: SystemOut_19.01.07_11.33.45.log, last value 0, cur size 26385775
[2019-01-14T19:26:28,239][TRACE][filewatch.tailmode.handlers.unignore] -------------------- >>>>> update_existing_specifically: unignore {"watched file"=>"<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @recent_states='[:watched, :watched, :active, :ignored]', @bytes_read='0', @bytes_unread='26385775', current_size='26385775', last_stat_size='26385775', file_open?='false', @initial=false, @sincedb_key='unknown 0 0'>", "sincedb value"=>#<FileWatch::SincedbValue:0xb5d09d @last_changed_at=1547515587.151, @watched_file="<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @sincedb_key='unknown 0 0, size=26385775>", @position=0>}
[2019-01-14T19:26:28,251][TRACE][filewatch.tailmode.processor] Delayed Delete processing
[2019-01-14T19:26:28,253][TRACE][filewatch.tailmode.processor] Watched + Active restat processing
[2019-01-14T19:26:28,256][TRACE][filewatch.tailmode.processor] Rotation In Progress processing
[2019-01-14T19:26:28,257][TRACE][filewatch.tailmode.processor] Watched processing
[2019-01-14T19:26:28,259][TRACE][filewatch.tailmode.handlers.createinitial] handling: SystemOut.log
[2019-01-14T19:26:28,261][TRACE][filewatch.tailmode.handlers.createinitial] opening SystemOut.log
[2019-01-14T19:26:28,265][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 100752, path: SystemOut.log
[2019-01-14T19:26:28,266][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: found sincedb record {"sincedb key"=>#<struct FileWatch::InodeStruct inode="unknown", maj=0, min=0>, "sincedb value"=>#<FileWatch::SincedbValue:0xb5d09d @last_changed_at=1547515588.248, @watched_file="<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @sincedb_key='unknown 0 0, size=26385775>", @position=0>}
[2019-01-14T19:26:28,267][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: the found sincedb_value has a watched_file - this is a rename, switching inode to this watched file

For SystemOut_19.01.07_11.33.45.log I think the trace is saying that it read the file when it was previously called SystemOut.log, so it is not going to reprocess it.

Now that the configuration appears to be finding the files, can you delete the existing sincedb?

The sinceDB path is currently set to null so it should believe it never parsed the logfile. Specifying /dev/null as sincedb_path or using NUL (Windows) within my environment path.

So, am I correct to assume that every file will be processed again, no matter what?

set ENV_SINCEDBPATH=NUL

Messages like that suggest to me that it is using a sincedb. On Window you should be using NUL.

Correct and confirmed. This is the setting that I am using within my configuration.

set ENV_SINCEDBPATH=NUL

How are you referencing that in your configuration?

I initially set environment variables...

set ENV_SINCEDBPATH=NUL

And then when I run the logstash using the configuration it calls the various environment variables.

input {  
  file {
      path => "C:/Logs/20190107/icnserver12/SystemOut_19.01.04_14.20.14.log"
      type => "${ENV_TYPE}"
      start_position => "beginning"
      sincedb_path => "${ENV_SINCEDBPATH}"
      tags => ["${ENV_TAG1}", "${ENV_TAG2}", "${ENV_TAG3}"]
	  codec => multiline {
					      pattern => "^\["
					      negate => true
						  what => previous
						  charset => "ISO-8859-1"
					     }
       } 

}

Run with trace logging and see what you get for these lines

[2019-01-15T11:44:21,267][TRACE][filewatch.sincedbcollection] open: reading from /dev/null
[2019-01-15T11:44:21,268][TRACE][filewatch.sincedbcollection] open: count of keys read: 0

Of interest....these are the 'filewatch.sincedbcollection' info in the trace file.

Line 703: [2019-01-15T11:43:10,738][TRACE][filewatch.sincedbcollection] open: reading from NUL
Line 704: [2019-01-15T11:43:10,754][TRACE][filewatch.sincedbcollection] open: count of keys read: 0
Line 708: [2019-01-15T11:43:11,042][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190115/icnserver12/SystemErr.log"}
Line 709: [2019-01-15T11:43:11,051][TRACE][filewatch.sincedbcollection] associate: unmatched
Line 711: [2019-01-15T11:43:11,104][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190115/icnserver12/SystemErr_19.01.14_17.17.57.log"}
Line 712: [2019-01-15T11:43:11,106][TRACE][filewatch.sincedbcollection] associate: unmatched
Line 714: [2019-01-15T11:43:11,128][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190115/icnserver12/SystemOut.log"}
Line 715: [2019-01-15T11:43:11,133][TRACE][filewatch.sincedbcollection] associate: unmatched
Line 717: [2019-01-15T11:43:11,141][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190115/icnserver12/SystemOut_19.01.14_14.07.58.log"}
Line 718: [2019-01-15T11:43:11,142][TRACE][filewatch.sincedbcollection] associate: unmatched
Line 2744: [2019-01-15T11:44:32,269][TRACE][filewatch.sincedbcollection] caller requested sincedb write (tail mode subscribe complete - shutting down)
Line 2745: [2019-01-15T11:44:32,280][TRACE][filewatch.sincedbcollection] sincedb_write: to: NUL

And, I can see that it does know that all four (4) files are there...yet it doesn't read them at all.

[2019-01-15T11:43:10,872][TRACE][filewatch.discoverer     ] discover_files {"count"=>4}
[2019-01-15T11:43:11,034][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemErr.log', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='1186918', last_stat_size='1186918', file_open?='false', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-15T11:43:11,042][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"unknown", "path"=>"C:/Logs/20190115/icnserver12/SystemErr.log"}
[2019-01-15T11:43:11,051][TRACE][filewatch.sincedbcollection] associate: unmatched

Is it possible there is an issue with the multiline codec pattern? If it never triggers the end condition then you would see log lines like

[DEBUG][logstash.inputs.file     ] Received line {:path=>"/home/user/foo.txt", ...

but it would just accumulate more and more data in memory and not flush it until logstash will terminated.

I would think that if there were an issue with the multi-line codec pattern then the issue would still be resident even if I specified the exact file name. Thus, I can process the files if I input each and every file name directly.

Extremely odd.

It opens SystemOut.log and assigns it @sincedb_key='unknown 0 0'. It then opens SystemOut_19.01.07_11.33.45.log and assigns it @sincedb_key='unknown 0 0'. It looks in the in-memory sincedb and finds it already has an entry for that key and then infers that the second file is actually the first file after being rotated. Hence the "this is a rename, switching inode to this watched file", which will cause it to ignore one of the files. At that point things are totally pear-shaped.

The underlying issue is: sincedb_key set to 'unknown 0 0' for wildcard file input on Windows. So I suggest you ask a new question focused on that and hope that guy sees it.

[2019-01-14T19:26:27,036][TRACE][filewatch.tailmode.handlers.createinitial] handling: SystemOut.log
[2019-01-14T19:26:27,055][TRACE][filewatch.tailmode.handlers.createinitial] opening SystemOut.log
[2019-01-14T19:26:27,079][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 100750, path: SystemOut.log
[2019-01-14T19:26:27,084][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-14T19:26:27,118][TRACE][filewatch.tailmode.handlers.createinitial] add_new_value_sincedb_collection {"position"=>0, "watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut.log', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='14081405', current_size='14081405', last_stat_size='14081405', file_open?='true', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:27,127][TRACE][filewatch.tailmode.handlers.createinitial] handling: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,130][TRACE][filewatch.tailmode.handlers.createinitial] opening SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,138][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 100751, path: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:27,141][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: found sincedb record {"sincedb key"=>#<struct FileWatch::InodeStruct inode="unknown", maj=0, min=0>, "sincedb value"=>#<FileWatch::SincedbValue:0xb5d09d @last_changed_at=1547515587.112, @watched_file="<FileWatch::WatchedFile: @filename='SystemOut.log', @state='active', @sincedb_key='unknown 0 0, size=14081405>", @position=0>}
[2019-01-14T19:26:27,150][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: the found sincedb_value has a watched_file - this is a rename, switching inode to this watched file [2019-01-14T19:26:27,154][TRACE][filewatch.tailmode.handlers.createinitial] add_or_update_sincedb_collection: switching from... {"watched_file details"=>"<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='26385775', current_size='26385775', last_stat_size='26385775', file_open?='true', @initial=true, @sincedb_key='unknown 0 0'>"}
[2019-01-14T19:26:28,226][TRACE][filewatch.tailmode.handlers.unignore] handling: SystemOut_19.01.07_11.33.45.log
[2019-01-14T19:26:28,235][TRACE][filewatch.tailmode.handlers.unignore] update_existing_sincedb_collection_value: SystemOut_19.01.07_11.33.45.log, last value 0, cur size 26385775 [2019-01-14T19:26:28,239][TRACE][filewatch.tailmode.handlers.unignore] -------------------- >>>>> update_existing_specifically: unignore {"watched file"=>"<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @recent_states='[:watched, :watched, :active, :ignored]', @bytes_read='0', @bytes_unread='26385775', current_size='26385775', last_stat_size='26385775', file_open?='false', @initial=false, @sincedb_key='unknown 0 0'>", "sincedb value"=>#<FileWatch::SincedbValue:0xb5d09d @last_changed_at=1547515587.151, @watched_file="<FileWatch::WatchedFile: @filename='SystemOut_19.01.07_11.33.45.log', @state='watched', @sincedb_key='unknown 0 0, size=26385775>", @position=0>}

@Badger
@Blake_2112

FYI. You can dynamically change targeted logging sub-systems via the LS API...

The [logging API](https://www.elastic.co/guide/en/logstash/6.4/logging.html#_logging_apis) allows for different levels of logging for different components in LS.

First do `curl -XGET 'localhost:9600/_node/logging?pretty'`
You see something like this:

{
  "host" : "Elastics-MacBook-Pro.local",
  "version" : "6.4.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "8789409b-7126-4034-9347-de47e6ce12a9",
  "name" : "Elastics-MacBook-Pro.local",
  "loggers" : {
    "filewatch.discoverer" : "DEBUG",
    "filewatch.observingtail" : "DEBUG",
    "filewatch.sincedbcollection" : "DEBUG",
    "filewatch.tailmode.handlers.createinitial" : "DEBUG",
    "filewatch.tailmode.processor" : "DEBUG",
    "logstash.agent" : "DEBUG",
    "logstash.api.service" : "DEBUG",
    "logstash.codecs.json" : "DEBUG",
    ...
    "logstash.filters.date" : "DEBUG",
    "logstash.inputs.file" : "DEBUG",
    ...
    "logstash.outputs.stdout" : "DEBUG",
    "logstash.pipeline" : "DEBUG",
    ...
    "slowlog.logstash.codecs.json" : "TRACE",
    "slowlog.logstash.codecs.rubydebug" : "TRACE",
    "slowlog.logstash.filters.date" : "TRACE",
    "slowlog.logstash.inputs.file" : "TRACE",
    "slowlog.logstash.outputs.stdout" : "TRACE"
  }
}

The ones we will need to change are prefixed with filewatch
1. `filewatch.discoverer`, finds and feeds new files into the processing, discovered files become watched-files in the ignore state or the watched state.
2. `filewatch.observingtail`, acts as the callback path to the input and the queue
3. `filewatch.sincedbcollection`, manages the entries in the sincedb collection that persists to the sincedb file.
4. `filewatch.tailmode.handlers.createinitial`, handles reading from a file.
5. `filewatch.tailmode.processor`, manages the 'state' of a watched-file and detects deletions, rotations an  d if possible, tracks an inode through various renames.

There are a number of other handlers `grow`, `shrink` that handle a watched-file that has been seen to grow or shrink (truncated) since it was last (fully) read.

Using the API
Turn trace on:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'
{
    "logger.filewatch.discoverer" : "TRACE"
}
'

Turn trace off:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'
{
    "logger.filewatch.discoverer" : "WARN"
}
'

Or

curl -XPUT 'localhost:9600/_node/logging/reset?pretty'

NOTE: it might be a good idea to start LS with logging set to WARN in the logstash.yml so other logging is less verbose. 

@Badger
@Blake_2112

About the unknown 0 0 "inode", this is a bug. I wish I could track it down. I seen reports of it before but we could never find the underlying reason.

This occurs when the Windows Kernel32 API call fails but I don't know what causes the failure. I have done some testing on GCP Windows Server 2008 and 2016 (iirc) but I can't recreate a production simulator that gives me a chance to debug further.

From a file input operations perpective, inodes and their windows equivalent are supposed to be a unique identifier to some file based "content" and clearly "unknown 0 0" is far from unique. Any sincedb records with unknown 0 0 at the beginning of the line will be used to match positions of discovered files that have this same "inode" with incorrect results. FIle rotation can be triggered if more than one file is discovered with this "inode", again incorrect outcomes.

I would, very much, appreciate your help to uncover the true cause so I can fix this bug.

For example, is it that Windows has the file open for write at exactly the moment we do a stat call on the file?

Although I'm going to track this down on Windows this configuration is working just fine on my virtualized Linux environment. Multiple wildcards in the path are working as expected.

I see what I can do and when to further trace this Windows anomaly.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.