we have problems writing to hdfs via the webhdfs plugin. It does not create new files but will wirte to a file that does already exist. This prevents using tags like date, time, host and so on.
Does someone have the same problem and how do I solve this?
sorry. the config is pretty vanilla. The version is logstash 2.1 with webhdfs pulled from github (was not included in the "all-plugins" version).
the error message was
webhdfs write caused an exception: {"RemoteException":{"message":"Append failed for file:
\/services\/logstash\/test.log, error: No such file or directory
(2)","exception":"IOException","javaClassName":"java.io.IOException"}}. Maybe you should increase retry_interval
or reduce number of workers. Retrying... {:level=>:warn, :file=>"logstash/outputs/webhdfs.rb", :line=>"191",
:method=>"write_data"}
until I created the file with "hadoop dfs -touchz /services/logstash/test.log". Then it worked.
input {
log4j {
mode => server
host => "0.0.0.0"
port => "4560"
type => "log4j"
}
}
output {
webhdfs {
host => "355.305.404.230"
port => "14000"
path => "/services/logstash/test.log"
user => "hadoop"
}
I too face similar issue. Using webhdfs plugin in logstash to push a copy to HDFS.
It does creates file with dynamic names[dd/mm/yy etc], but writes only first line. All subsequent logs that logstash tries to push results in same error as stated in this post.
When i use pre created file like test.log, it works smoothly. I went thru some posts and they are suggesting to close the HDFS file once created and reopen again for appending. I guess that is not working properly with the plugin.
Strangely when i make dfs.replication factor to 1, it does work in all scenarios. I am using HDP sandbox.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.