Restarting logstash cloudwatch plugin

mphilip9 · April 10, 2023, 6:54pm

We have an ELK stack app that has been down for over a month due to a credentials issue in the logstash cloudwatch plugin. The plugin is digesting data again now, but what is strange is that it is digesting logs from the beginning of time. So logs from over two years ago. Also, no data is being outputted to Elasticsearch, perhaps because that data has already been transformed and outputted previously?

My main question: is this typical behavior? I'm not very familiar with logstash and elasticsearch, but I can't imagine every time you restart logstash it starts digesting every cloudwatch log from the very first logs. Not sure if it will help, but here is the logstash conf fiel for the cloudwatch plugin:

input {
    cloudwatch_logs {
        access_key_id => access_here
        secret_access_key => secret_here
        log_group => [ "xwingui-Prod", "xwingui-Dev", "xwingui-Exp", "xwingui-Staging", "xwingui-Test", "xwingui-Jawn"  ]
        region => "us-west-1"
        sincedb_path => "/var/lib/.sincedb"
    }
}

filter {
    if "Monitoring - " in [message] {
        if "API" in [message] {
            grok {
                match => { "message" => "API Monitoring - %{GREEDYDATA:json}" }
            }
            mutate {
                add_field => { "monitorType" => "API" }
            }
        } else if "RUM" in [message] {
            grok {
                match => { "message" => "RUM Monitoring - %{GREEDYDATA:json}" }
            }
            mutate {
                add_field => { "monitorType" => "RUM" }
            }
        } else if "PikaWorker" in [message] {
            grok {
                match => { "message" => "PikaWorker Monitoring - %{GREEDYDATA:json}" }
            }
            mutate {
                add_field => { "monitorType" => "PikaWorker" }
            }
        } else if "DataAgent" in [message] {
            grok {
                match => { "message" => "DataAgent Monitoring - %{GREEDYDATA:json}" }
            }
            mutate {
                add_field => { "monitorType" => "DataAgent" }
            }
        } else if "Database" in [message] {
            grok {
                match => { "message" => "Database Monitoring - %{GREEDYDATA:json}" }
            }
            mutate {
                add_field => { "monitorType" => "Database" }
            }
        } 

        json {
            source => "json"
            remove_field => "message"
        }
        mutate {
            add_field => { "isMonitor" => True }
        }
    }
}

output {
    elasticsearch {
        hosts => [ "localhost:9200" ]
        user => user_here
        password => pwd_here
    }
    stdout {
        codec => json
    }
}

Badger · April 10, 2023, 9:40pm

No. The input tracks what it has ingested in the sincedb. If "/var/lib/.sincedb" were removed then it would start over at the beginning, as you are seeing.

How do you know no data is going to elasticsearch? Could it be that your index rotation is automatically delete indexes containing two year old data?

mphilip9 · April 10, 2023, 10:36pm

I just checked the index lifecycle policies and it looks like they have been saving everything.

I checked the elasticsearch logs and there is nothing there.

/var/lib/.sincedb is still there, is there a way to check its contents?

Badger · April 10, 2023, 10:50pm

more /var/lib/.sincedb should work. It is written as a text file. The number next to the group identifier is the timestamp of the last message that was read in milliseconds since the epoch (.strftime("%Q")).

mphilip9 · April 10, 2023, 10:59pm

Hmm interesting, the time is 1610132873632 which is today. So technically it shouldn't be ingesting all those old logs?

mphilip9 · April 10, 2023, 11:03pm

For example, I'm seeing logs like this

"log_stream":"root","ingestion_time":"2021-08-10T21:23:00.999Z"

Badger · April 11, 2023, 1:56am

The AWS API returns an array of events, each of which has log_stream_name, timestamp, message, ingestion_time, event_id fields. The [@timestamp] field is set from the timestamp field, not the ingestion_time field. Is the @timestamp current or in 2021? If the latter it sounds like the issue is on the AWS side.

leandrojmp · April 11, 2023, 2:08am

This date is not today, it is the epoch in milisconds for January 08th, 2021.

You can check on bash by removing the 3 last numbers and running the following command:

date -d@1610132873

mphilip9 · April 11, 2023, 3:33am

Ah thanks for that! I l had used an epoch converter but got the wrong date. That makes sense.

It appears the timestamp is in 2021 and not current

"@timestamp":"2021-12-10T21:25:23.645Z"

Shouldn't the .sincedb file be updated as new logs are ingested?

mphilip9 · April 11, 2023, 3:51am

Would it make sense to simply manually edit the file?

Badger · April 11, 2023, 3:06pm

That should work, or delete the entry for the group and set start_position => end. (You need to stop logstash before changing the file and restart it afterwards.)

mphilip9 · April 11, 2023, 6:48pm

What do you mean by delete the entry for the group? the sincedb_path?

Badger · April 11, 2023, 6:52pm

Yes, you should be able to delete the line from the file (make sure to have a backup).

mphilip9 · April 11, 2023, 7:30pm

Sorry to ask the same question twice, but to make doubly sure. If I change this line in the cloudwatch conf file:

        sincedb_path => "/var/lib/.sincedb"

to

start_position => end

Logstash will essentially just process log groups that are brand new (sincedb time will be set to now)?

Badger · April 11, 2023, 8:06pm

You would add the start_position => end to the existing configuration. Do not remove the sincedb_path.

mphilip9 · April 13, 2023, 1:42pm

Hey that worked like a charm and everything is running smoothly now. Thanks for all you help Badger!

system · May 11, 2023, 1:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Cloudwatch Input Plugin Logstash	3	169	November 1, 2022
Logstash 6.1.1 missing logs for cloudwatch plugin: aws Logstash	1	537	September 11, 2018
Input-cloudwatch-logs skipping streams Logstash	2	552	April 4, 2018
Trying to use Logstash to index from AWS cloudwatch logs and Inject to ElasticSearch Logstash	1	1252	February 19, 2018
Logs generated while logstash stopped weren't collected after restart Logstash	6	2108	July 10, 2017

Restarting logstash cloudwatch plugin

Related topics