Filebeat not detecting all changes made to csv File


(Ganessen Mootheeveeren) #1

Hello,

I am facing difficulties over Filebeat.

My present scenario is that I am using Filebeat (installed in a windows server) to send data (in csv File format) to Logstash and finally to Elasticsearch (both tools installed in a linux server).
The way that the csv file is generated, is through a powershell script ,which is launched during the execution of a job.
This job is launched several times a day, and as a result more lines will be added to the csv file, at any point in the file.
It happens sometimes that some lines where are present in the csv file, are not present in the elastic indexed documents.

Csv file at a certain time during the day
A;B;C;D;E;F;G
B;D;F;F;G;H;I
C;D;F;G;G;H;L
E;A;A;V;F;G;H
F;G;R;F;G;R;R

Csv file at a later time with new lines inserted during the day

A;B;C;D;E;F;G
B;D;F;F;G;H;I
C;D;F;G;G;H;L
D;E;F;G;A;D;D
E;A;A;V;F;G;H
F;G;R;F;G;R;R
G;A;A;B;B;E;G

So, my questions are as follows:

• When a new file is added in the filebeat directory for ingestion , a harvester is started for the file. Does the harvester detect all changes made in the csv file even though the new lines are not necesarily inserted at the end of the file?
• How do we verify that the harvester has treated all line in the csv file?

Thanks in advance.
Ganessen.


(Christian Dahlqvist) #2

Filebeat tails the file, so will only capture lines appended to the end of the file.


(Ganessen Mootheeveeren) #3

Thanks for the reply Christian.

So my only solution would be to make sure that all new lines are being inserted at the end of the file?

Im also facing issues, with incomplete data being ingested.
For example:
Line in csv file:
apple;kiwi;mango;pamplemousses;pear;orange;cost;selling_price;profit_made

Line obtained in Elasticsearch:
orange;cost;selling_price;profit_made

Note that, I am the the semi-colon as separator in my logstash config file.

filter {
  
  csv {
    separator => ";"

Do you have any idea what would cause an issue like that?

Thanks,
Ganessen.


(Christian Dahlqvist) #4

If your process is currently modifying the file rather than strictly appending to it I would look into exactly how new lines are added. If part of a line is added in several steps, potentially followed by a newline, I guess it is possible Filebeat could catch this before the fill line has been written. make sure that each line is written in one step to avoid potential problems.


(Ganessen Mootheeveeren) #5

Hello,

As checked, once the script is launched again, it would normally create and overwrite the old file, hence the new lines would not necessarily be at the end of the file, and therefore tha'ts why Filebeat has not detected the new lines inserted.

Before I would modify the script to insert all «new» lines at the very end of the file, is there any way for Filebeat to re-analyse all lines once again as like «for loop», and add only the new lines?

Concerning my second issue, I have constructed the script in such a way that the csv files will be transferred to the Filebeat directory once the script has already created the csv file.

Copy-Item -Path $Log_creation_directory -Destination $Filebeat_directory

Does copy paste and overwriting a file to the Filebeat Directory when Filebeat is always working with the file, may entails issue like that?

Thanks,
Ganessen.


(Christian Dahlqvist) #6

This might explain why you are getting partial lines as well. If a line earlier in the file all of a sudden increases in size, it is possible a partial line could be read as Filebeat uses the file offset to keep track of what has been processed.

Filebeat is not designed to handle this scenario, so I do not believe that is possible.

I think the altering of prior content may be at fault here rather than the method with which the file is copied.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.