I have a logstash configured to ingest S3 access logs , i can successfully ingest data to Elasticsearch when i restart the logstash without any issues. But when new files gets added to the directory from where the Logstash pull the data from, logstash does nothing. I am not sure what's going wrong. The permission and everything is correct. Below is my logstash configuration.
Is there a reason i should use it in replacement for file based logging? I mean i am not a logstash expert though but i haven't tried that. Will using the s3 plugin fix my problem?
You should upgrade to at least LS 1.5.6 - the last release of the 1.5 series. It uses the ruby filewatch 0.6.7 library for the file input. filewatch 0.6.4 had a bug which lost file tracking info.
And I just looked, the 1.0.0 logstash-input-s3 plugin does not use filewatch library, rather it downloads and processes directly (and has an option to save a copy of the file). This should work around the issue that you are seeing, and provide the same level of functionality.
@jpcarey, Is there a way to find out the the file input plugin i have currently is using the ruby filewatch library. I did update the logstash, but i believe the re-install didn't update the plugin.
I would assume it has something to do with how the s3cmd downloads and creates the files. I would guess that it creates some temporary files during download. This would require some in depth troubleshooting to figure out what all is happening, and why logstash does not pick up (or potentially believe that it has already processed the file).
You might try appending an extension for the finished file (if you can do this with s3cmd or control the s3 access log naming pattern). Then, configure logstash to look for *.my_file_type. It should then ignore any temporary files in the directory.
I tested this out today on a different box with same setup as my EC2 instance. But on my local workstation it works perfectly fine and detects new files when added to the directory by s3cmd.
I will try the s3 input and see if that takes care of the issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.