Recursive depth of Logstash - not finding my files?

sgreszcz · May 17, 2019, 1:37pm

I'm really having problems finding the csv files I need to parse with Logstash. I'm wondering if it is my configuration or there is a limit to the recursion depth.

test@host-001:/data/tracelogger/tracelogger_data# find . -name "PerfMon*.csv"
./processed/AP4/10.66.6.21/2019-05-17_06-00-03/cm/log/ris/csv/PerfMon_05_17_2019_04_25.csv
./processed/AP4/10.66.6.21/2019-05-17_01-45-23/cm/log/ris/csv/PerfMon_05_17_2019_00_55.csv
./processed/AP4/10.66.6.21/2019-05-17_06-59-20/cm/log/ris/csv/PerfMon_05_17_2019_06_10.csv
./processed/AP4/10.66.6.21/2019-05-17_00-00-07/cm/log/ris/csv/PerfMon_05_16_2019_23_10.csv`

However the logstash config doesn’t find them based on this config:

input {
file {
id => "perfmon-in"
path => "/data/tracelogger/tracelogger_data/**/PerfMon*"
tags => ["perfmon"]
}
}

filter {
csv {
id => "perfmon-filter"
autodetect_column_names => "true"
convert => {
"column1" => "date_time"
}
}
}

output {
if "perfmon" in [tags]{

	elasticsearch
	{
	hosts => "${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}"
	index => "perfmon-%{+YYYY.MM.dd}"
	id => "perfmon-out"
    }
    stdout { codec => rubydebug }
}

}

And I can see logstash configuring the pipeline OK in the logs:

2019-05-17 12:06:37.762 [2019-05-17T11:06:37,762][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_d0460663a664248d37e0e08a24eedf1f", :path=>["/data/tracelogger/tracelogger_data/**/PerfMon*"]} 2019-05-17 12:06:37.817 [2019-05-17T11:06:37,816][INFO ][logstash.javapipeline ] Pipeline started {"pipeline.id"=>"perfmon"} 2019-05-17 12:06:37.826 [2019-05-17T11:06:37,826][DEBUG][logstash.javapipeline ] Pipeline started successfully {:pipeline_id=>"perfmon", :thread=>"#<Thread:0x4de86a24 run>"}

I have debug logs on but I never see logstash finding any files.

pastechecker · May 17, 2019, 1:51pm

Could you please try with the absolute path?
path => "/data/tracelogger/tracelogger_data/**/PerfMon*.csv"

Also can you verify that the Logstash has the permissions to read the files from the directory?

Badger · May 17, 2019, 2:23pm

All the filewatch file handling is logged at trace level.

sgreszcz · May 17, 2019, 2:37pm

OK, thanks I'll set the logging to that level.

sgreszcz · May 17, 2019, 2:41pm

I put the explicit path now (no ** glob expansion) and I still don't see the file being picked up:
path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.36/2019-05-17_21-29-18/cm/log/ris/csv/PerfMon*"

I also know for sure the file can be found. I'm using docker but the /data volume is shared between host and container. I entered into the container and ran a bash 'find . -name "PerfMon*" ' command from the /data/tracelogger/tracelogger_data/processed/ path and could find the .csv files.

I'm going to set up trace logging level to see what I find. Super frustrating, yikes.

sgreszcz · May 17, 2019, 2:51pm

OK, put the specific path in and I can see Logstash "watching" the file that it found. However I guess it thinks that it already parsed the csv as it says "no change" based in he "sincedb_key".

Thanks for your help so far, I'm going to try and wipe the "sincedb" and try again, also to see if I can see the proper ** glob expansion now that I'm using the "trace" debugs.

[2019-05-17T14:46:34,446][TRACE][filewatch.tailmode.processor] Active - no change {"watched_file"=>"<FileWatch::WatchedFile: @filename='PerfMon_05_17_2019_20_48.csv', @state='active', @recent_states='[:watched, :watched]', @bytes_read='2038603', @bytes_unread='0', current_size='2038603', last_stat_size='2038603', file_open?='true', @initial=false, @sincedb_key='252055458 0 2065'>"}

pastechecker · May 17, 2019, 2:54pm

Unless you do not want to monitor that files as it is rotating with time, you can go to the dev null, but after the potential logstash crash you will ingest the data again.

sgreszcz · May 17, 2019, 3:38pm

@Badger

OK, something is definitely not working well with the wildcard patterns. I wonder if it is due to the depth of path? I can't find any documentation about maximum file path depth (although in filebeat docs it mentions something about 8).

Anyway, I can "find" my files if I use this pattern:
path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.23/2019-05-17_20-21-30/cm/log/ris/csv/PerfMon*"

But if I use this pattern it doesn't work (variable expansion wildcard):
path => "/data/tracelogger/tracelogger_data/**/PerfMon*"

Nor this one (explicit wildcards for the fixed path depth):
path => "/data/tracelogger/tracelogger_data/*/*/*/*/*/*/*/*/PerfMon*"

I guess I can try filebeat and see if I can stream the files into logstash to process.

mailtoarasu · May 18, 2019, 7:47pm

facing same issue with wildcard patterns. I am trying to process some of log files that are 2 weeks older. I am running logstash - 7.0.1 as a docker in "read" mode since log files are complete.

This is example directory structure

/backup_may2019/received_date=2019-05-05/received_time=02-00-00/
/backup_may2019/received_date=2019-05-05/received_time=03-00-00/

It is working fine if i give absolute path up to the last level folder

path => "/backup_may2019/received_date=2019-05-05/received_time=03-00-00/*"

it doesn't read files if i use wildcards in the folder path:

path => "/backup_may2019/received_date=2019-05-05/*/*"
path => "/backup_may2019/received_date=2019-05-05/**/*"

sgreszcz · May 18, 2019, 7:59pm

Thanks for much for verifying the problem that I'm also experiencing. I'm going to try filebeat next week, but I'm afraid I'll have the same problem. I'm not sure why there is a directory depth limit.

system · June 15, 2019, 7:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.