I'm really having problems finding the csv files I need to parse with Logstash. I'm wondering if it is my configuration or there is a limit to the recursion depth.
elasticsearch
{
hosts => "${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}"
index => "perfmon-%{+YYYY.MM.dd}"
id => "perfmon-out"
}
stdout { codec => rubydebug }
}
}
And I can see logstash configuring the pipeline OK in the logs:
2019-05-17 12:06:37.762 [2019-05-17T11:06:37,762][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_d0460663a664248d37e0e08a24eedf1f", :path=>["/data/tracelogger/tracelogger_data/**/PerfMon*"]} 2019-05-17 12:06:37.817 [2019-05-17T11:06:37,816][INFO ][logstash.javapipeline ] Pipeline started {"pipeline.id"=>"perfmon"} 2019-05-17 12:06:37.826 [2019-05-17T11:06:37,826][DEBUG][logstash.javapipeline ] Pipeline started successfully {:pipeline_id=>"perfmon", :thread=>"#<Thread:0x4de86a24 run>"}
I have debug logs on but I never see logstash finding any files.
I put the explicit path now (no ** glob expansion) and I still don't see the file being picked up:
path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.36/2019-05-17_21-29-18/cm/log/ris/csv/PerfMon*"
I also know for sure the file can be found. I'm using docker but the /data volume is shared between host and container. I entered into the container and ran a bash 'find . -name "PerfMon*" ' command from the /data/tracelogger/tracelogger_data/processed/ path and could find the .csv files.
I'm going to set up trace logging level to see what I find. Super frustrating, yikes.
OK, put the specific path in and I can see Logstash "watching" the file that it found. However I guess it thinks that it already parsed the csv as it says "no change" based in he "sincedb_key".
Thanks for your help so far, I'm going to try and wipe the "sincedb" and try again, also to see if I can see the proper ** glob expansion now that I'm using the "trace" debugs.
[2019-05-17T14:46:34,446][TRACE][filewatch.tailmode.processor] Active - no change {"watched_file"=>"<FileWatch::WatchedFile: @filename='PerfMon_05_17_2019_20_48.csv', @state='active', @recent_states='[:watched, :watched]', @bytes_read='2038603', @bytes_unread='0', current_size='2038603', last_stat_size='2038603', file_open?='true', @initial=false, @sincedb_key='252055458 0 2065'>"}
Unless you do not want to monitor that files as it is rotating with time, you can go to the dev null, but after the potential logstash crash you will ingest the data again.
OK, something is definitely not working well with the wildcard patterns. I wonder if it is due to the depth of path? I can't find any documentation about maximum file path depth (although in filebeat docs it mentions something about 8).
Anyway, I can "find" my files if I use this pattern: path => "/data/tracelogger/tracelogger_data/processed/AP4/10.66.132.23/2019-05-17_20-21-30/cm/log/ris/csv/PerfMon*"
But if I use this pattern it doesn't work (variable expansion wildcard): path => "/data/tracelogger/tracelogger_data/**/PerfMon*"
Nor this one (explicit wildcards for the fixed path depth): path => "/data/tracelogger/tracelogger_data/*/*/*/*/*/*/*/*/PerfMon*"
I guess I can try filebeat and see if I can stream the files into logstash to process.
facing same issue with wildcard patterns. I am trying to process some of log files that are 2 weeks older. I am running logstash - 7.0.1 as a docker in "read" mode since log files are complete.
Thanks for much for verifying the problem that I'm also experiencing. I'm going to try filebeat next week, but I'm afraid I'll have the same problem. I'm not sure why there is a directory depth limit.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.