We currently have Logstash running as a service and are running a query using the “schedule” property in the Logstash configuration to retrieve data using a JDBC plugin. This query pulls previous (System Time – 15 minutes) minutes of data, but it seems that when the query starts, it skips the data during the time it takes when running the query to when it finishes processing. This is evident during times of peak volume.
For example, if the query were pulling in the previous 5 minutes of data:
[10:00:00] – Query starts pulling in data from 9:55 – 10:00
[10:00:30*] – Query finishes processing and pipeline terminates
[10:05:00] – Query finishes pulling in data from 9:55:30 – 10:00:00 but is missing data from 9:55:00 – 9:55:30.
[10:05:00] – New Query runs pulling in data from 10:00 – 10:05
[10:07:00] – Query finishes processing and pipeline terminates
[10:10:00] – Query finishes pulling in data from 10:02 – 10:05 but is missing data from 10:00 – 10:02
*The amount of time for this query takes to run and process can greatly vary from several seconds to several minutes.
When we ran the command manually from the CLI, there did not seem to be an issue since the pipeline is not starting and terminating. Running a cron job had similar results as the example above with data loss.