Painless Scripting

Hi,
I am trying to get a count of the requests hitting my server by time range, and have come up with below query:

GET /logs-prod*/_search?pretty=true
{
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "type": "REQ"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2022-03-29T01:00:00.000Z",
              "lte": "2022-03-29T15:00:00.000Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "txn-count": {
      "terms": {
        "script": "if (doc[\"timestamp\"].size() == 0) return '';else if (doc[\"timestamp\"].value.getHour() >= 1 && doc[\"timestamp\"].value.getHour() <= 14 ) return \"Peak Transactions\"; else return \"Off-Peak Transactions\""
      }
    }
  }
}

However, the result is coming as below:

  "aggregations" : {
    "txn-count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Peak Transactions",
          "doc_count" : 164783
        }
      ]
    }
  }
}

i.e. everything is just being grouped into Peak Transactions, what am I doing wrong?

Thanks in advance!

Hey, you are filtering from 1am till 3pm in your bool query with a range filter so, with the exception of documents being indexed exactly at 3pm, all of your docs fall into Peak Transactions anyway?

@spinscale does not seem so, please see below:

GET /logs-prod*/_count
{
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "type": "REQ"
          }
        },
        {
          "range": {
            "timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-03-29T01:00:00.000Z",
              "lte": "2022-03-29T14:00:00.000Z"
            }
          }
        }
      ]
    }
  }
}
}
{
  "count" : 161655,
  "_shards" : {
    "total" : 211,
    "successful" : 211,
    "skipped" : 0,
    "failed" : 0
  }
}

Thanks.

Sorry, but I fail to understand what this should prove compared to what I said above?

@spinscale since I am using the condition,

if (doc[\"timestamp\"].value.getHour() >= 1 && doc[\"timestamp\"].value.getHour() <= 14 ) return \"Peak Transactions\"

in the script, shouldn't 161655 documents fall under Peak Transactions?

Thanks!

can you share the full aggregation response, and also add the show_term_doc_count_error parameter. See Terms aggregation | Elasticsearch Guide [8.1] | Elastic

@spinscale sure, here you go, I executed below query (hope I used the parameter correctly)

GET /logs-prod*/_search?pretty=true
{
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "type": "REQ"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2022-03-29T01:00:00.000Z",
              "lte": "2022-03-29T15:00:00.000Z"
            }
          }
        }
      ]
    }
  },
   "size": 0,
  "aggs": {
    "txn-count": {
      "terms": {
        "show_term_doc_count_error": true,
        "script": "if (doc[\"timestamp\"].size() == 0) return '';else if (doc[\"timestamp\"].value.getHour() >= 1 && doc[\"timestamp\"].value.getHour() <= 14 ) return \"Peak Transactions\"; else return \"Off-Peak Transactions\""
      }
    }
  }
}

Output is below

{
  "took" : 222,
  "timed_out" : false,
  "_shards" : {
    "total" : 212,
    "successful" : 212,
    "skipped" : 211,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "txn-count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Peak Transactions",
          "doc_count" : 164783,
          "doc_count_error_upper_bound" : 0
        }
      ]
    }
  }
}

Thanks.

Can you enable tracking of total hits Search your data | Elasticsearch Guide [8.1] | Elastic - not yet sure about that discrepancy... numbers are supposed to be exact as only one shard seems to be involved

@spinscale

GET /logs-prod*/_search?pretty=true
{
  "track_total_hits": true,
  "query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "type": "REQ"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "2022-03-29T01:00:00.000Z",
              "lte": "2022-03-29T15:00:00.000Z"
            }
          }
        }
      ]
    }
  },
   "size": 0,
  "aggs": {
    "txn-count": {
      "terms": {
        "show_term_doc_count_error": true,
        "script": "if (doc[\"timestamp\"].size() == 0) return '';else if (doc[\"timestamp\"].value.getHour() >= 1 && doc[\"timestamp\"].value.getHour() <= 14 ) return \"Peak Transactions\"; else return \"Off-Peak Transactions\""
      }
    }
  }
}

Output

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 212,
    "successful" : 212,
    "skipped" : 211,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 164783,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "txn-count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Peak Transactions",
          "doc_count" : 164783,
          "doc_count_error_upper_bound" : 0
        }
      ]
    }
  }
}

So, the query returns 164783 hits as well as the aggregation. That means that part looks good to me.

Above you stated, that 161655 are returned. Why isn't that the case here, when you are using the same query against the same indices?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.