Recursive depth of Logstash - not finding my files?

(Stephen Greszczyszyn) #1

I'm really having problems finding the csv files I need to parse with Logstash. I'm wondering if it is my configuration or there is a limit to the recursion depth.

test@host-001:/data/tracelogger/tracelogger_data# find . -name "PerfMon*.csv"
./processed/AP4/10.66.6.21/2019-05-17_06-00-03/cm/log/ris/csv/PerfMon_05_17_2019_04_25.csv
./processed/AP4/10.66.6.21/2019-05-17_01-45-23/cm/log/ris/csv/PerfMon_05_17_2019_00_55.csv
./processed/AP4/10.66.6.21/2019-05-17_06-59-20/cm/log/ris/csv/PerfMon_05_17_2019_06_10.csv
./processed/AP4/10.66.6.21/2019-05-17_00-00-07/cm/log/ris/csv/PerfMon_05_16_2019_23_10.csv`

However the logstash config doesn’t find them based on this config:

input {
file {
id => "perfmon-in"
path => "/data/tracelogger/tracelogger_data/**/PerfMon*"
tags => ["perfmon"]
}
}

filter {
csv {
id => "perfmon-filter"
autodetect_column_names => "true"
convert => {
"column1" => "date_time"
}
}
}

output {
if "perfmon" in [tags]{

	elasticsearch
	{
	hosts => "${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}"
	index => "perfmon-%{+YYYY.MM.dd}"
	id => "perfmon-out"
    }
    stdout { codec => rubydebug }
}

}

And I can see logstash configuring the pipeline OK in the logs:

2019-05-17 12:06:37.762 [2019-05-17T11:06:37,762][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_d0460663a664248d37e0e08a24eedf1f", :path=>["/data/tracelogger/tracelogger_data/**/PerfMon*"]} 2019-05-17 12:06:37.817 [2019-05-17T11:06:37,816][INFO ][logstash.javapipeline ] Pipeline started {"pipeline.id"=>"perfmon"} 2019-05-17 12:06:37.826 [2019-05-17T11:06:37,826][DEBUG][logstash.javapipeline ] Pipeline started successfully {:pipeline_id=>"perfmon", :thread=>"#<Thread:0x4de86a24 run>"}

I have debug logs on but I never see logstash finding any files.

(Charlie) #2

Could you please try with the absolute path?
path => "/data/tracelogger/tracelogger_data/**/PerfMon*.csv"

Also can you verify that the Logstash has the permissions to read the files from the directory?

1 Like
#3

All the filewatch file handling is logged at trace level.

1 Like
(Stephen Greszczyszyn) #4

OK, thanks I'll set the logging to that level.

(Stephen Greszczyszyn) #5

I put the explicit path now (no ** glob expansion) and I still don't see the file being picked up:
path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.36/2019-05-17_21-29-18/cm/log/ris/csv/PerfMon*"

I also know for sure the file can be found. I'm using docker but the /data volume is shared between host and container. I entered into the container and ran a bash 'find . -name "PerfMon*" ' command from the /data/tracelogger/tracelogger_data/processed/ path and could find the .csv files.

I'm going to set up trace logging level to see what I find. Super frustrating, yikes.

(Stephen Greszczyszyn) #6

OK, put the specific path in and I can see Logstash "watching" the file that it found. However I guess it thinks that it already parsed the csv as it says "no change" based in he "sincedb_key".

Thanks for your help so far, I'm going to try and wipe the "sincedb" and try again, also to see if I can see the proper ** glob expansion now that I'm using the "trace" debugs.

[2019-05-17T14:46:34,446][TRACE][filewatch.tailmode.processor] Active - no change {"watched_file"=>"<FileWatch::WatchedFile: @filename='PerfMon_05_17_2019_20_48.csv', @state='active', @recent_states='[:watched, :watched]', @bytes_read='2038603', @bytes_unread='0', current_size='2038603', last_stat_size='2038603', file_open?='true', @initial=false, @sincedb_key='252055458 0 2065'>"}

(Charlie) #7

Unless you do not want to monitor that files as it is rotating with time, you can go to the dev null, but after the potential logstash crash you will ingest the data again.

(Stephen Greszczyszyn) #8

@Badger

OK, something is definitely not working well with the wildcard patterns. I wonder if it is due to the depth of path? I can't find any documentation about maximum file path depth (although in filebeat docs it mentions something about 8).

Anyway, I can "find" my files if I use this pattern:
path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.23/2019-05-17_20-21-30/cm/log/ris/csv/PerfMon*"

But if I use this pattern it doesn't work (variable expansion wildcard):
path => "/data/tracelogger/tracelogger_data/**/PerfMon*"

Nor this one (explicit wildcards for the fixed path depth):
path => "/data/tracelogger/tracelogger_data/*/*/*/*/*/*/*/*/PerfMon*"

I guess I can try filebeat and see if I can stream the files into logstash to process.

(Thirunavukkarasu Shanmugam) #9

facing same issue with wildcard patterns. I am trying to process some of log files that are 2 weeks older. I am running logstash - 7.0.1 as a docker in "read" mode since log files are complete.

This is example directory structure

/backup_may2019/received_date=2019-05-05/received_time=02-00-00/
/backup_may2019/received_date=2019-05-05/received_time=03-00-00/

It is working fine if i give absolute path up to the last level folder

path => "/backup_may2019/received_date=2019-05-05/received_time=03-00-00/*"

it doesn't read files if i use wildcards in the folder path:

path => "/backup_may2019/received_date=2019-05-05/*/*"
path => "/backup_may2019/received_date=2019-05-05/**/*"

1 Like
(Stephen Greszczyszyn) #10

Thanks for much for verifying the problem that I'm also experiencing. I'm going to try filebeat next week, but I'm afraid I'll have the same problem. I'm not sure why there is a directory depth limit.