I'm parsing a lot of old logs files. All logs are in gz, so I have to uncompress then move it a folder watched by filebeat.
On about 30millions of entries, I have about 1300 failures in logstash logs. I'm logging the messages so I can see that Logstash received a partial line, the line is truncated randomly.
I doubled check to ensure that I don't have any special characters or so in my logs. So why Filebeat is sending partial lines?
Thanks for the data. Could you provide an example message which was truncated? Also I was looking for the Filebeat logs. Do you see anything special in there?
Is the volume you read logs from a shared drive and somehow mounted or a local disk?
If you write the log output to file instead of LS, do you still see it happening?
I'm now on 5.3 for all the stack except Filebeat that is still in 5.2.2.
I'm running filebeat on a unique node as a docker container but this issue was already existing when filebeat was running directly on the host).
The disk is not a ssd but a raid 5 sata.
Filebeat is streaming to 3 Logstash nodes (a container on the same node and 2 others remote).
I will try to make some test to write on disk directly today or tomorrow.
Could it be that your partial lines come actually from an other file because it reuses the inode? We had a similar case here: https://github.com/elastic/beats/issues/714#issuecomment-295329605 If that is the case, I recommend you to first move the files to an other place to clean up the registry and then remove the files later. This will prevent the inode reuse.
Hi, sorry for the late answer, I was waiting to be sure that the issue was resolved. Now I'm moving finished logs to a tmp dir instead of delete them, and I think that solved my problem.
Instead of inode, wouldn't possible to use file path? Or make it configurable for user like me that parse logs not in real time?
About using path as identifier instead of inode: Agree. This should be an option to configure or even be a separate prospector type for example file where it is assumed that files are never renamed or data is never appended. Feel free to open a feature request for this on Github.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.