I'm new to the Elastic Stack and have been learning how to make things work. My client wants to be able to get stats for their hardware device which checks for software upgrades every evening. They use CDN77 to host their upgrade files.
CDN77 have a custom structure which I'm handling using a custom Logstash grok filter. I will be setting up the Elastic Stack using Filebeat to send log files to Logstash.
This is all working now and I just need to create dashboards for my client. The problem I have is fetching the log files for Filebeat. CDN77 provide an API for me to download the log files. Their log files are named like 26Feb2019.gz which contains 26Feb2019.txt.
Most software upgrades take place between 00:50-05:30 and my client would like to check how things went in the morning so they cannot wait until CDN77 has finsihed writing the log file for the current day and moved on to the next one.
So what is the best way for me to download the log files every hour and present them to Filebeat in a way that it won't produce duplicate entries or cause issues for Filebeat. I will create a CRON job to fetch the log file every hour, should I copy it directly over the log file with the matching name, should I create an hourly log file, or is there another way to do this?
Finally I'm looking at using Digital Ocean to host this as they can handle Docker containers which is what I've been using to create the Elastic Stack. What would be the minimum memory requirement to handle ELK and Filebeat on the same machine, we're looking at around 20,000 rows per day in the log files.