How to speed up script query

Hello,

I have an mapping for a field visit_times which is a date type and contains many epoch timestamps. I have some complex queries such as checking the dayOfWeek of any of the values within that array or even checking just within timestamps.

Here is an example query which takes really long:


POST /programs/_search
{
  "query": {
    "bool": {
      "must": [
         {
          "script": {
            "script": {
              "lang": "painless",
              "source": """              
                for (visit_time in doc['visit_times']) {              
                  long milliseconds = visit_time.toInstant().toEpochMilli();
                  if (milliseconds >= Long.parseLong("1582588800000") && milliseconds <= Long.parseLong("1582675199999")) {
                    return true;                
                  }              
                }              
                return false;            
              """
            }
          }
        }
      ]
    }
  }
}

OR this query:


POST /programs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "program_id": 155
          }
        },
        {
          "script": {
            "script": {
              "lang": "painless",
              "source": """
                for (date_field in doc['last_waitlist_time']) {
                    int hourOfDay = date_field.getHour();
                  int minuteOfHour = date_field.getMinuteOfHour();
                  if (hourOfDay < Integer.parseInt("23") && minuteOfHour < Integer.parseInt("00")) {
                      return true;
                  }
                }
                return false;
              """
            }
          }
        }
      ]
    }
  }
}

Mapping for that field:

       "visit_times" : {
          "type" : "date",
          "null_value" : "0"
        },

This takes quite a bit of time compared to other queries. Is there another way to do this? It's really quick using expression however that doesn't loop through the whole array, instead it finds the minimum. IIRC.

Expressions are faster for the reason you described: they only look at one value. Additionally, it looks like you are unnecessarily parsing integers from strings? And when would minuteOfHour be less than 0? Seems like you would always iterate the entire list of times.

You might consider reindexing, or using an ingest processor to extract the time (hour+min, maybe as a decimal) into its own field. This way you can use normal range queries to find what you are looking for. In general, if a script is too slow, it means you are doing too much work on too many documents. You can reduce the number of documents the script runs on with additional query clauses, reduce the complexity of the script eg by only looking at the first time as expressions would do, or reindex to get your data in a form that is more efficiently queryable.

1 Like

@rjernst Thanks so much for the answer. I agree with all of it! I was just wondering if there was a way to speed up the for loop but the reindexing or ingest processor is definitely the best option.

I found out that using a range query is by far the fastest but is there a way to use range query for DayOfWeek or Hour/Minute, maybe using format?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.