Logstash log repeated acquisition

kubo_Smith · November 23, 2023, 7:05am

My log is like this: [23/Nov/2023:14:12:37 +0800] | gateway | [http] | 195.161.250.29 | POST /gateway/xxx HTTP/1.1 | 9090 | 200 | 182 | sss67jhsd | 0 | gateway | - | - | - | - | Java/1.8.0_362
Logstash(version: 7.4.2) configuration:

input {

    file {
       path => ["/gateway/logs/access/*.log"]
       ignore_older => 86400
       type => "acclog"
        max_open_files => 65536
       sincedb_path => "/gateway/config/sincedb_path/sincedb"
     }
}
output {
  if [type] == "acclog" {
    kafka {
         retries  => 2
         bootstrap_servers => "kafka2:9092"
         topic_id => 'acclog'
        }
   }

problem description:
I clearly set ignore_older => 86400, but there is always the problem of repeated collection of logs during use. How to solve this problem?

leandrojmp · November 24, 2023, 12:24pm

Can you provide more context about this? Share some examples of data?

The ignore_older will make Logstash ignore any files that haven't been modified in the specified time, in this case 86400 seconds, but if this file is modified it will no longer ignored.

Do you have files older than 86400 seconds that logstash did not ignore? Can you provide some evidence? Like the logs from those files and the result of the stat /path/file.log command on linux?

Rios · November 24, 2023, 3:06pm

Just to add, maaaaybe, the logstash user doesn't have right on the sincedb file, check permissions.

kubo_Smith · November 27, 2023, 7:48am

ooo If it's ATIME, his time is changing
In addition to this parameter ignore_older is there any other way I can control the collection of only the day's logs?

kubo_Smith · November 27, 2023, 7:51am

permissions:
-rw-r--r-- 1 logstash logstash 17M Nov 27 15:48 sincedb

Rios · November 27, 2023, 9:22am

Good, no issues with sincedb.
How logs are named? Are those rollover or daily logs?

Don't forget to respond to Leandro questions.

kubo_Smith · November 28, 2023, 1:54am

The daily log will be cut every day.
The name of the log is 2023-11-28.log today and 2023-11-29.log tomorrow.

kubo_Smith · November 28, 2023, 1:56am

ooo If it's ATIME, his time is changing
In addition to this parameter ignore_older is there any other way I can control the collection of only the day's logs?

leandrojmp · November 28, 2023, 12:31pm

You didn't share the return of the stat command nor any evidence of the duplication, can you share that? Like a screenshot of Kibana showing duplicate lines.

atime is access time, it not necessarily mean that the file was changed.

What does this mean? You have daily logs? Or are you renaming the log file? What will happen with the log 2023-11-28.log when the day changes to 2023-11-29.log?

leandrojmp · November 28, 2023, 12:40pm

Also, is this path a network filesystem?

kubo_Smith · November 29, 2023, 1:55am

The result returned by the stat command：
File: ‘application.2023-11-27.5bc6b9b99c.log’
Size: 15462033165 Blocks: 30199288 IO Block: 1048576 regular file
Device: d4h/212d Inode: 129956680212 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 2000/ app) Gid: ( 2000/ app)
Access: 2023-11-27 00:00:00.159117000 +0800
Modify: 2023-11-28 00:00:00.031329000 +0800
Change: 2023-11-28 00:00:00.199025000 +0800
Birth: -
yes,i have daily logs.
no network filesystem,It's local storage
is there any other way to collect only the daily log besides ignore_older?

Rios · November 29, 2023, 4:50am

The sincedb database file should keep track which exist and with the correct rights, however from some reason files have been read again.

Can you set log.level: trace in logstash.yml and restart LS? There should be info about sincedb.

kubo_Smith · November 29, 2023, 6:14am

Ok, I set log.level: trace, but I should pay attention to those keywords in the log.

kubo_Smith · December 4, 2023, 3:13am

Hello, in addition, I would like to ask why I only collect the logs of the day except the parameter ignore_older.

Rios · December 4, 2023, 7:34am

Are you using live log tracking or you read closed file, finished for writing?
I have dug the documentation, can you add next settings:

input {
    file {
	start_position => "beginning"
	mode=> "read"
	ignore_older => "1 d"
...

Explanation:

mode read - If read is specified, these settings can be used: ignore_older (older files are not processed)
start_position - Choose where Logstash starts initially reading files: at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to beginning . * Default value is "end"
ignore_older - Use the string notation, easier to read

system · January 1, 2024, 7:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.