We have an ELK stack app that has been down for over a month due to a credentials issue in the logstash cloudwatch plugin. The plugin is digesting data again now, but what is strange is that it is digesting logs from the beginning of time. So logs from over two years ago. Also, no data is being outputted to Elasticsearch, perhaps because that data has already been transformed and outputted previously?
My main question: is this typical behavior? I'm not very familiar with logstash and elasticsearch, but I can't imagine every time you restart logstash it starts digesting every cloudwatch log from the very first logs. Not sure if it will help, but here is the logstash conf fiel for the cloudwatch plugin:
No. The input tracks what it has ingested in the sincedb. If "/var/lib/.sincedb" were removed then it would start over at the beginning, as you are seeing.
How do you know no data is going to elasticsearch? Could it be that your index rotation is automatically delete indexes containing two year old data?
more /var/lib/.sincedb should work. It is written as a text file. The number next to the group identifier is the timestamp of the last message that was read in milliseconds since the epoch (.strftime("%Q")).
The AWS API returns an array of events, each of which has log_stream_name, timestamp, message, ingestion_time, event_id fields. The [@timestamp] field is set from the timestamp field, not the ingestion_time field. Is the @timestamp current or in 2021? If the latter it sounds like the issue is on the AWS side.
That should work, or delete the entry for the group and set start_position => end. (You need to stop logstash before changing the file and restart it afterwards.)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.