The scenario is that I need to read files, make some changes and write the content to another file (say output_file) via filebeat. Also I had another application(say app2) to do the process-and-remove thing to the output_file.
I know it is strange but I need it.
The problem is that when the filebeat is writing the output_file and the output_file is removed by app2 while writing, filebeat stops to write any more.
For example. The original_file is 100MB. FIlebeat reads the original_file and writes it to output_file. When the output_file grows to 50MB, app2 removes it. Then the remaining 50MB gets lost.
I tried with below script to generate output_file and it works well. All 10000000 records are processed. for ((i=0; i<10000000; ++i)); do echo $i >> output_file.log; done
The scenario as described is not really clear to me.
read files, make some changes and write the content to another file
which process is reading the file.
what kind of changes are we talking about
process-and-remove thing
process-and-remove thing? So the process aforementioned ETL process is filtering out lines or is it deleting files?
The problem is that when the filebeat is writing the output_file and the output_file is removed by app2 while writing, filebeat stops to write any more.
So app2 does some post-processing on filebeat output? Or is app2 running instead of filebeat? Why does app2 remove the file, filebeat is still writing to?
Under which condition do you remove a file? Normally a file's metadata are only removed, but as long as a file is still held by any process, it's not really deleted yet. That is, you can not just delete a file and expect a process to create a new file, as the OS will not report any kind of error/signal to the process.
The option given by maddin2016 incorporates file-rotation. The idea is to rely on file-rotation to give you an idea when filebeat has finished writing to a log-file.
Alternatively to relying on rotation in filebeat, you can configure filebeat to send events to stdout. Then you can process/pipe the output right from the stream, the way you want.
well, as I already wrote, you can't just delete a file a process is still writing to. The file does still exist under the hood, it's just that you can not find it anymore in the directory structure. That is, you have to wait with deleting the file until filebeat has finished writing to it...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.