Logstash is missing real time logs

hello!!

I'm reading logs with file input plugin. When I'm reading older logs logstash is performing very good. But while reading live logs it is missing aroung 2% of the logs.

My files are coming through ftp in a solaris 10 server and logstash is also in that server. I'm using ingest pipeline to process the files.

Logstash config:

input {

file {

  path => "/xpool/home/user/DUMP/*file1"

  sincedb_path => "/xpool/logstash-6.1.1/logs/sinceDB/sinceDB_file1_ext"

  id => "file1_input"

  add_field => { "request_type" => "file1" }
  
  exclude => ["*.gz"]

  max_open_files=>36000

  start_position => "beginning"

}

file {

  path => "/xpool/home/user/DUMP/*file2"

  sincedb_path => "/xpool/logstash-6.1.1/logs/sinceDB/sinceDB_file2_ext"

  id => "file2_input"

  add_field => { "request_type" => "file2"}

  exclude => ["*.gz"]

  max_open_files=>36000

  start_position => "beginning"

}
}

output {
if [request_type] == "file1"
{
elasticsearch {
id => "output_file1"
hosts => "http://10.10.23.180:9200"
pipeline=> "provisioning_file1"
}
}else if [request_type] == "file2"
{
elasticsearch {
id => "output_file2"
hosts => "http://10.10.23.180:9200"
pipeline=> "provisioning_file2"
}
}
}

There are around 10M data for file1 and 30M data for file 2. Also sometimes logstash is reading the file from in between the line.

Example:
original msg:

4,10000000,10000000,1777369722,6.120000,08:59:29,air82,Bus,somenode,20180306 08:59:29,,,,20180306,-1.700000,,53,request_data,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,OUTPUTCDR_4007_4532_20180306-085922.file1,,,,,,,,,,,,,,,,,,,,,36529660980845502080,

logstash message field:

"message": "_4532_20180306-085922.file1,,,,,,,,,,,,,,,,,,,,,36529660980845502080,"

My files are coming through ftp

Exactly what does this mean?

Our application team is pushing the files to the mentioned solaris server using ftp every 5mins. I don't know why I mentioned that. Maybe its not important how the files are coming.

This is most likely the reason why you're seeing this behavior. Pushing log files via FTP like you describe it is fragile and can lead to exactly the kind of problems you describe.

Will rsync work better than the FTP? I'm asking this because I tried to move the files to a different server using rsync and filebeat to do the logstsah's job and it was performing good.

Also in another development server we tried filebat with same FTP process and it was also doing good there. Then why is logstash showing this behavior?

Also is there any way I can tell logstsah to start procssing the file based on file modify time?

Will rsync work better than the FTP?

No.

Also in another development server we tried filebat with same FTP process and it was also doing good there. Then why is logstash showing this behavior?

Perhaps because Filebeat and Logstash behave slightly differently, perhaps because there's an inherent race between the FTP process and Logstash so log volume, machine performance etc matters.

Also is there any way I can tell logstsah to start procssing the file based on file modify time?

It's not clear exactly what you mean by that but I'm pretty sure the answer is no.

What you should do is run Filebeat on the machines with the logs. If that's not possible you may have to build something yourself to figure out what happened between two FTP or rsync copy operations and pass those changes to Logstash.

I wanted to run filebeat. But there are 2 issues

First its a solaris server and I can't run filebeat there.

Second filebeat works great in a ubuntu server. But when I tried it from red hat it is too slow. At most 500 TPS compared to 3000 from ubuntu. I can't find out the reason. :frowning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.