Help me to refine this query

Context:

  • Logstash Elasticsearch Input plugin does not maintain a state like JDBC input plugin.
  • However it does provide scheduling support.
  • So I decided to pull data daily from the previous day records, e.g. from say 4th Aug 00:00:00:000 to 5th Aug 00:00:00:000. If there is anyother way please do share.

Here is the script I came up with. Somehow I feel there should be a better way of doing it. Also the CPU on the nodes shoots up high when I run this.

The idea is to get current date and set the time to 00:00:00:000 and then get the date before it. Then use these two dates to get all the events falling between them.

GET testindex/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": """
              SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
              Calendar start = Calendar.getInstance();
              Calendar end = Calendar.getInstance();
              end.set(Calendar.HOUR_OF_DAY, 0);
              end.set(Calendar.MINUTE, 0);
              end.set(Calendar.SECOND, 0);
              start.add(Calendar.HOUR, -1*24);
              start.set(Calendar.HOUR_OF_DAY, 0);
              start.set(Calendar.MINUTE, 0);
              start.set(Calendar.SECOND, 0);
              def startTime = start.getTimeInMillis();
              def endTime = end.getTimeInMillis();
              def eventTime = doc["@timestamp"].date.toInstant().toEpochMilli();
              return eventTime > startTime && eventTime <= endTime;"""
            }
          }
        },
        {
          "match": {
            "message": "This is expensive to run !!"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ]
}

As of now it works but I am sure there is a more native way of doing this in Elasticsearch painless script. Maybe some in built DateTime/Java.Time way rather than the Java.Calendar.

EDIT: Changed tags to Logstash

Am I misunderstanding something or isn't that just a very laborious way to implement a date range as a script? And even the exact same query as the one in the documentation?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#ranges-on-dates

"range": {
  "@timestamp": {
    "gte": "now-1d/d",
    "lt": "now/d"
   }
}

(Edit: Well. To be precise, your query would be this with gt and lte instead :))

It is critical that I do not lose any events at the start and the end of the timeframe (I am using elaspsed time plugin of logstash).

I am not very confident if running the job at daily schedule at say 12 in night will get all the samples at the edges. The now might (will?) have different values on the three nodes I have by the time it starts execution. I did not find the documentation stating that now value will be fixed at the issue of the command.

Hence decided to see if I can provide a hard range of (Day-1) 00:00:000 to (Day) 00:00:000.

Happy to be proven wrong and super happy to use the now-1d/d based ranges. It will make life super easy.

BTW I did try this also as one of the suggestions but it did not work:

"range": {
  "@timestamp": {
    "gte": "now-1d/d",
    "lt": "now/d",
   "format": "yyyy-MM-dd"
   }
}

But now still had the time part. Is there any way to round off now to nearest hour? That can also work for me.

What did it do? This should give you the documents between midnight of the day before and midnight today. (I don't know why the format parameter is there when only relative times are used in the query, so the format isn't used anywhere?)

What do you mean?

You could use /h instead of /d to round down to the hour instead of the day. That is explained in the date math documentation that is linked in the article mentioned above.


Edit: As a side note: You could have mentioned that you've already got a topic for this. Didn't setting the time_zone help?

Hi @Jenni,

I did have that topic but it did not help. I think I should have provided an closure on that. Apologies for that.

My understanding of now was flawed. :disappointed:
Re-reading the document sorted everything. :slightly_smiling_face:
I went for hourly collection of data instead of daily.

"range": {
      "timestamp": {
        "gte": "now-1h/h",
        "lt": "now/h"
      }
    }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.