Currently the Logstash instance that we're running basically runs 24 hours a day (processes millions of logs everyday). It's using a file as a mapping dictionary through a Translate filter with the default refresh interval. The docs here mention a default rate for refreshing the file and that's what we're using in the configuration file.
My question is regarding the architecture of the refresh process: If logs are continuously being processed, does it mean the file being used for mapping is under lock and being read continuously or is it read into memory every refresh interval and that's what's used for mapping?
My use case is that the file will be updated on a weekly basis but the process will only be able to update the file if it's not being read by another process (i.e. Logstash in this case). Any help is appreciated!
Thanks @Badger, you mean the file contents itself? So then if the file is being updated it doesn't matter because the previous contents are already in a hash in-memory right?
That makes sense, thanks! Would you know why I could be running into the following error:
[main] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<LogStash::Filters::Dictionary::DictionaryFileError: Translate: Unquoted fields do not allow \r or \n (line 1). when loading dictionary file at ...
What do you mean, isn't that the normal way a csv is defined? We tried it without the double quotes and it's the same exact error which is why we tried with the quotes.
I mean a file that uses \r\n as a line ending on a machine that uses \n as a line ending. That would result in the ruby CSV parser seeing a trailing \r on a field.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.