Read From Top of File

Is there a way to configure FileBeat to read a log file bottom to top? I have a log that is inverted, where the most recent records appear at the top and the oldest records are at the bottom.

Hi,

  1. What application or method produces that file?
  2. Why/how do you know that new content is appended at the top?
  3. Is it being constantly updated with new content, like a log file(by the top...) while you want filebeat to read it or when it is created it is final and never updated again?
  4. Is there a timestamp on every log line in that file?
  5. Do you want to read it from the top because you currently see them in reverse order when viewing them in kibana?
  6. Or it’s because you see a lot of duplicates when you harvest that file with filebeat?
  7. Is it rotated like a log file can be rotated such that files which are no longer being updated have a different name than the active file?

Depending on what that file is and how it is produced, you wont be able to use filebeat for ingesting it live while it still being updated by the top. You would have to wait for it to be final before filebeat can ingest it.

I think a file updated by the top is actually a file being constantly rewritten completely with the new content at the top. While not impossible it is weird, can you elaborate with the answers to the above questions? You didn’t give enough information to establish if and how you could successfully harvest such file with filebeat.

Actually I see a way to do it but it is incredibly hackish and would involve either completely clearing the filebeat registry while restarting filebeat on a constant and frequent basis or removing the specific entry for that file in the filebeat registry while restarting filebeat on a constant and frequent basis. (Doable with systemd on linux with minimal work; timers and scripts as service units.)

Additionally you would have to use logstash to compute a fingerprint and deduplicate the events because each event would be harvested multiple times. (Constantly reharvesting the file completely from the top to get the new stuff into the pipeline.)

If the events already have a unique identifier then you could simply use logstash to use that unique id as the document id so that elasticsearch "insert or update" or "create if absent" indexing behavior can be used to manage the deduplication.
I think ingest node could also do that without logstash, not sure at this time.

This would require external coordination you would have to put in place, it is hackish and sacrifices ressources, etc. But it could still be doable if really required.

It's a small switch topology mapping program, LanTopoLog

Because the most recent timestamps occur at the top.

It is the application's log file for alerts.

yes

The file is updated in a weird manner resulting in new events being ingested when the file is updated but it results in a grok or date parsing failure

I utilize fingerprinting in the id field to prevent duplication.

No.

That's exactly what I am doing at the moment, lol.

I emailed the program dev and he released a build that allows you to define the order in which the log file is updated, I'll give that a go and see what comes of it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.