Logstash log repeated acquisition

  1. My log is like this: [23/Nov/2023:14:12:37 +0800] | gateway | [http] | 195.161.250.29 | POST /gateway/xxx HTTP/1.1 | 9090 | 200 | 182 | sss67jhsd | 0 | gateway | - | - | - | - | Java/1.8.0_362
  2. Logstash(version: 7.4.2) configuration:
input {

    file {
       path => ["/gateway/logs/access/*.log"]
       ignore_older => 86400
       type => "acclog"
        max_open_files => 65536
       sincedb_path => "/gateway/config/sincedb_path/sincedb"
     }
}
output {
  if [type] == "acclog" {
    kafka {
         retries  => 2
         bootstrap_servers => "kafka2:9092"
         topic_id => 'acclog'
        }
   }
  1. problem description:
    I clearly set ignore_older => 86400, but there is always the problem of repeated collection of logs during use. How to solve this problem?

Can you provide more context about this? Share some examples of data?

The ignore_older will make Logstash ignore any files that haven't been modified in the specified time, in this case 86400 seconds, but if this file is modified it will no longer ignored.

Do you have files older than 86400 seconds that logstash did not ignore? Can you provide some evidence? Like the logs from those files and the result of the stat /path/file.log command on linux?

Just to add, maaaaybe, the logstash user doesn't have right on the sincedb file, check permissions.

  1. ooo If it's ATIME, his time is changing
  2. In addition to this parameter ignore_older is there any other way I can control the collection of only the day's logs?

permissions:
-rw-r--r-- 1 logstash logstash 17M Nov 27 15:48 sincedb

Good, no issues with sincedb.
How logs are named? Are those rollover or daily logs?

Don't forget to respond to Leandro questions.

The daily log will be cut every day.
The name of the log is 2023-11-28.log today and 2023-11-29.log tomorrow.

  1. ooo If it's ATIME, his time is changing
  2. In addition to this parameter ignore_older is there any other way I can control the collection of only the day's logs?

You didn't share the return of the stat command nor any evidence of the duplication, can you share that? Like a screenshot of Kibana showing duplicate lines.

atime is access time, it not necessarily mean that the file was changed.

What does this mean? You have daily logs? Or are you renaming the log file? What will happen with the log 2023-11-28.log when the day changes to 2023-11-29.log?

Also, is this path a network filesystem?

  1. The result returned by the stat command:
    File: ‘application.2023-11-27.5bc6b9b99c.log’
    Size: 15462033165 Blocks: 30199288 IO Block: 1048576 regular file
    Device: d4h/212d Inode: 129956680212 Links: 1
    Access: (0644/-rw-r--r--) Uid: ( 2000/ app) Gid: ( 2000/ app)
    Access: 2023-11-27 00:00:00.159117000 +0800
    Modify: 2023-11-28 00:00:00.031329000 +0800
    Change: 2023-11-28 00:00:00.199025000 +0800
    Birth: -
  2. yes,i have daily logs.
  3. no network filesystem,It's local storage
  4. is there any other way to collect only the daily log besides ignore_older?

The sincedb database file should keep track which exist and with the correct rights, however from some reason files have been read again.

Can you set log.level: trace in logstash.yml and restart LS? There should be info about sincedb.

Ok, I set log.level: trace, but I should pay attention to those keywords in the log.

Hello, in addition, I would like to ask why I only collect the logs of the day except the parameter ignore_older.

Are you using live log tracking or you read closed file, finished for writing?
I have dug the documentation, can you add next settings:

input {
    file {
	start_position => "beginning"
	mode=> "read"
	ignore_older => "1 d"
...

Explanation:

  • mode read - If read is specified, these settings can be used: ignore_older (older files are not processed)
  • start_position - Choose where Logstash starts initially reading files: at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to beginning . * Default value is "end"
  • ignore_older - Use the string notation, easier to read

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.