Hello again. I am testing Logstash options and I am now creating this configuration: I am using 2 pipelines, one creates files with this configuration:
Then I have a second pipeline configured to elaborate every line as a single event to be able to pass in the document_id parameter (this is the best way I thought it could work):
Having the two pipeline working alternatively, creating the file first and then activating the second one to process the files, every line is insert correctly. The problem starts when I make those two pipeline working simultaneously.
Looks like not all the lines are picked up (maybe because the files are started to be read before they are closed?).
Do you have any explanation why this is happening and suggestions in reading the files correctly when two pipelines are working simultaneously?
@leandrojmp technically, the file is written in one shot and not updated after the first write. I have another Logstash configuration using http output and it works perfectly:
The file is the output of a pipeline that has an http input and then read by another pipeline with the file input in the read mode.
It is possible that the file input will read the file while it is still not finished, and then it will not read again.
The read mode works on files where the content is already completed when Logstash starts reading it, this is not your case.
You need to use the tail mode so logstash will get new lines, but in tail mode logstash cannot delete the file, so you will need to delete using another tool.
Also, you seem to be on windows, is log_streaming a network share? Network shares also can be problematic.
Thank you for the answer @leandrojmp .
I am studying all the possibilities that Logstash has, and the folder log_streaming is just a local folder on my pc.
So, do you think that it's just pure luck that in the http output plugin example all documents are sent correctly?
I will try using tail mode, but I will also verify if there is an interval in discovering new files, so the system can have the time to write them or Logstash needs time to elaborate the other files chronologically before the latest files are written on the disk.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.