Want to Process Multiple Files that exist in One Folder

bhutchinson · August 31, 2015, 8:07pm

Windows 7
Elasticsearch v1.7.0
Logstash v1.5.4

I have multiple files that I want to run through Logstash and have Logstash insert into elasticsearch. Each of my files are apache access files. All my files are in one folder with similar file names.

Example File Names (All files are in one folder C:\logs)
access_log.2015-08-27
access_log.2015-08-29
access_log.2015-08-30

For some reason when I run logstash, logstash starts (I see a logstash started message in the DOS Console), but I do not see that the log files are parsed by logstash (I am writing to the stdout so that I can monitor activity) and no data is populated into elasticsearch. Why is logstash not parsing my apache access log files and populating Elasticsearch with documents? If I use a specific file name in the input "path" Example c:\logs\access_log.2015-08-27 then logstash parses the file and populates elasticsearch successfully.

Here is my logstash.conf file

input {
file {
path => "c:\logs*.*"
start_position => "beginning"
}
}

filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
locale => "en"
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}

output {
elasticsearch {
host => "localhost"
protocol => "http"
cluster => "bauer"
index => "test"
}

stdout {
codec => rubydebug
}
}

magnusbaeck · August 31, 2015, 8:23pm

Isn't there a backslash missing from the filename pattern? c:\logs*.* won't match any files in the c:\logs directory.

bhutchinson · August 31, 2015, 8:39pm

Thank you for your reply. Yes in my post I forgot to include a back slash.

My corrected input below still does not work. (included the backslash C:\logs*.* Not sure what is going on? All I see in my DOS Console is "Logstash startup completed". No files get parsed by Logstash and no documents are inserted into elasticsearch. If I enter the entire file name and process the files one by one, Logstash parses the files and inserts documents into elasticsearch successfully. Any ideas?

input {
file {
path => "c:\logs*.*"
start_position => "beginning"
}
}

bhutchinson · August 31, 2015, 8:57pm

Not sure why the backslash after the c:\logs is not showing in my post, but I do have the backslash after c:\logs in my logstash.conf file for my input path. Any ideas? What do I have to put for the input "path" value to have logstash process all files in the c:\logs folder? Example all files that have file names with the words "access_log"

sumithub · September 1, 2015, 12:09am

Hi,

For windows, you need to have forward slash to the path of your log files.
Below is the input configuration that works fine in windows and parses all the files and subfolders under logs directory

input {
file {
path => "c:/temp/logs/*.txt"
type => "apachelogs"
}
}

bhutchinson · September 1, 2015, 7:29pm

Thank you for your response. Changing my backslashes "" to forward slashes "/" in my input file path value worked.

I do have one small question. I cannot figure out why after I made the changes detailed above, the "first time" that I run logstash from a DOS Command Prompt, logstash was able to parse the input file and input documents in elasticsearch. If I try to run logstash a second time (using the same logstash.conf file, changing the index name to a different index) logstash just hangs in the DOS Console telling me that Logstash startup has completed. Nothing else happens.

Why can I not run logstash using the same logstash.conf file pointing to a different index? Any ideas?

magnusbaeck · September 1, 2015, 7:37pm

Because the file input tracks the current file position and won't reread a file unless you force it to forget about a file by deleting the sincedb file that contains the state information.

The documentation of this is limited but is somewhat improved in https://github.com/logstash-plugins/logstash-input-file/pull/61/files (a revision that hasn't reached https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html yet).

bhutchinson · September 1, 2015, 7:54pm

Ok. This makes sense. So if I want to rerun logstash and populate a different index, I need to first delete a file named sincedb?

I am trying to understand where the sincedb file is located.

The documenation says "By default, the sincedb file is placed in the home directory of the user running Logstash with a filename based on the filename patterns being watched". Where is the home directory of the user running Logstash?

magnusbaeck · September 1, 2015, 8:40pm

Ok. This makes sense. So if I want to rerun logstash and populate a different index, I need to first delete a file named sincedb?

Yes, although it's not named exactly "sincedb".

The documenation says "By default, the sincedb file is placed in the home directory of the user running Logstash with a filename based on the filename patterns being watched". Where is the home directory of the user running Logstash?

For Windows I'm not sure since there's no home directory concept in the same way as on Unix-like system. I'd probably use Process Explorer on a running Logstash process to see what open files it has, or just search the file system for files with sincedb in the name.

Logstash requires either the HOME or the SINCEDB_DIR environment variable to be set, so if you can find out the value of either one for the user that Logstash runs as that's another option.

Alternatively you can set an explicit sincedb path with the sincedb_path parameter.

bhutchinson · September 1, 2015, 9:05pm

Thank you again for your help. Once I specified a sincedb_path for my file input, logstash created a file named sincedb in the sincedb_path folder once the parsing and import of documents was complete. Note that I did not create a sincedb file before running Logstash, I just had to specify where the folder and file would exist. Logstash created the file.

input {
file {
path => "c:/logs/*"
sincedb_path => "c:/logs/sincedb"
start_position => "beginning"
}
}

I was able to rerun logstash using the same logstash.conf file after I manually deleted the sincedb file from the c:/logs folder.

Topic		Replies	Views
Parse Amazon S3 access log with multiple files Logstash	8	2998	July 6, 2017
Mutliple file input not processed (or may be erased in elasticsearch) Logstash	3	273	June 23, 2020
Logstash input multiple files output one index elasticsearch failure Logstash	3	2743	February 24, 2017
Multiple logstash config files Logstash	4	1932	June 1, 2018
File Input is picky! Logstash	2	730	July 6, 2017

Want to Process Multiple Files that exist in One Folder

Related topics