Logstash Elasticsearch Input plugin does not maintain a state like JDBC input plugin.
However it does provide scheduling support.
So I decided to pull data daily from the previous day records, e.g. from say 4th Aug 00:00:00:000 to 5th Aug 00:00:00:000. If there is anyother way please do share.
Here is the script I came up with. Somehow I feel there should be a better way of doing it. Also the CPU on the nodes shoots up high when I run this.
The idea is to get current date and set the time to 00:00:00:000 and then get the date before it. Then use these two dates to get all the events falling between them.
As of now it works but I am sure there is a more native way of doing this in Elasticsearch painless script. Maybe some in built DateTime/Java.Time way rather than the Java.Calendar.
It is critical that I do not lose any events at the start and the end of the timeframe (I am using elaspsed time plugin of logstash).
I am not very confident if running the job at daily schedule at say 12 in night will get all the samples at the edges. The now might (will?) have different values on the three nodes I have by the time it starts execution. I did not find the documentation stating that now value will be fixed at the issue of the command.
Hence decided to see if I can provide a hard range of (Day-1) 00:00:000 to (Day) 00:00:000.
Happy to be proven wrong and super happy to use the now-1d/d based ranges. It will make life super easy.
BTW I did try this also as one of the suggestions but it did not work:
What did it do? This should give you the documents between midnight of the day before and midnight today. (I don't know why the format parameter is there when only relative times are used in the query, so the format isn't used anywhere?)
What do you mean?
You could use /h instead of /d to round down to the hour instead of the day. That is explained in the date math documentation that is linked in the article mentioned above.
Edit: As a side note: You could have mentioned that you've already got a topic for this. Didn't setting the time_zone help?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.