Aggragate with a script yields different results

Hi,
im trying to get the average of the time difference between 2 time fields with the following query:

GET x/_search
{
  "_source": ["ts", "@timestamp"], 
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
   "size": 1,
   "script_fields" : {
        "test1" : {
            "script" : {
                "lang": "painless",
                "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000"
            }
        }
   }
   , "aggs": {
     "avarge_delay": {
       "avg": {
         "script": {
                "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000",
                "lang": "painless"
            }
       }
     }
   }
  
}

the "test1" scripted field gives the wanted result. but the aggregation is way off of the average (i gave it size =1 ).
as you can see i used the exact same script

if i return only one document size=1 i get test=4000 and the aggragated value is 1000 times larger

ill be happy if anyone can point me to my mistake


thats when quering a size=1

Your script field will return you the value of @timestamp - ts in seconds for that one document that you are returning. That's what script fields do, they return a script calculated value per hit.

The avg aggregation will run on all documents and return you the average value of @timestamp - ts across all docs. Setting size=1 has no influence on the aggregation scope. That scope is determined by all documents that match your query (a match_all query in your case).

Thank you for your reply!
im fairly new at elasticsearch. in order to get the average of the last 200 documents by @timestam. i should filter it somehow under the "match all" field?
how can the avg be only for the last 200?

thank you again for helping!

What you're asking is not trivial :slightly_smiling_face:

If you wanted to simply to get the average over a specific time period, then that would be easy. You could use a range query that limits the documents to just those of for example the last two hours, using something called date math (now-2h in the example below):

GET x/_search
{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gt": "now-2h"
      }
    }
  },
  "aggs": {
    "average_delay": {
      "avg": {
        "script": {
          "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000",
          "lang": "painless"
        }
      }
    }
  }
}

If you really want to aggregate on a specific number of documents then you would have to do some advanced stuff. You could use a distance_feature query (new in Elasticsearch 7.2) to give your documents a score based on how old they are. You could combine that with a sampler aggregation that limits the scope to just a specific number of highest scoring documents (ie. most recent documents). That request would look something like this:

{
  "size": 0,
  "query": {
    "distance_feature": {
      "field": "@timestamp",
      "pivot": "730d",
      "origin": "now"
    }
  },
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 200
      },
      "aggs": {
        "average_delay": {
          "avg": {
            "script": {
              "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000",
              "lang": "painless"
            }
          }
        }
      }
    }
  }
}

Note: that last request will only work if you are querying a single index with a single shard (the default in Elasticsearch 7).

Thanks so much for your assictance again!
first and simple method would be more than enough
unfortunately that didn't solve my problem. i ran the following:

GET X/_search
{
  "_source": ["ts", "@timestamp"],
  "size": 1,
  "query": {
    "range": {
      "@timestamp": {
        "gt": "now-2h"
      }
    }
  },
  "script_fields" : {
        "test1" : {
            "script" : {
                "lang": "painless",
                "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000"
            }
        }
   },
  "aggs": {
    "average_delay": {
      "avg": {
        "script": {
          "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000",
          "lang": "painless"
        }
      }
    }
  }
}

and got the same results. the value of the aggregated doesn't make sence at all...
its bizzare because test 1 yields the correct value..

thats the outcome:

{
  "took": 402,
  "timed_out": false,
  "_shards": {
    "total": 52,
    "successful": 52,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2996939,
    "max_score": 1,
    "hits": [
      {
        "_index": "X",
        "_type": "logs",
        "_id": "11Qf02wBfoJnQ0YMzlqG",
        "_score": 1,
        "_source": {
          "@timestamp": "2019-08-27T12:49:32.086Z",
          "ts": "2019-08-27T12:49:31,320"
        },
        "fields": {
          "test1": [
            0
          ]
        }
      }
    ]
  },
  "aggregations": {
    "average_delay": {
      "value": 16937778.22412268
    }
  }
}

do you have any guesses?
Thanks again for your help!

You're comparing apples and oranges again. The script field returns you the average in one document. The aggregation the average over the last two hours of data.

i know. but thats not the average for the last 2 hours.. the last 2 hours show only 0 in the test field, so average should be around 0. thats why it the average doesn't make sense.

if i change the time to 1m i still get a huge number. and in the last minute all test1 values are 0

this indeed was a good solution. but the script acts different while under the aggregation.
if you have any idea what could be the problem, that will be great. i marked it as a solution for now

My guess would be that you may have a data quality issue. Maybe there is a document with a very high difference between @timestamp and ts? This would skew the average to return that high number.

Well i left it for a while but i think i found the problem.
doc['ts'] gives the milliseconds from epoch.
> doc['@timestamp] gives milliseconds from epoch also
but when i try to subtract them i get:

"type": "class_cast_exception",
"reason": "class_cast_exception: Cannot apply operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Dates] and [org.elasticsearch.index.fielddata.ScriptDocValues.Dates]."

doc['ts'].value gives 0
doc['@timestamp'].value still gives millis from epoch

and when tring to substrat them (without adding .millis) i get an error:

"type": "class_cast_exception",
"reason": "class_cast_exception: Cannot apply operation to types [org.joda.time.MutableDateTime] and [org.joda.time.MutableDateTime]."

like adding the value changes one of the types.. in elasticsearch they are both set as dates.

Do you have any idea what would cause such a thing? this is the reason i got numbers that dont add up. one of the fields was giving a zero.
any way to format them to be the same and make the subtraction work?

Thank you very much for your help so far

If doc['ts'].value returns 0 then I would think you've got a document with a ts field with a value of 0. This would be the data quality issue I was referring to.

Let's check a few things. Can you paste the output of:

GET X/_mapping

Let's also take a look at the docs that have the lowest and highest values of both ts and @timestamp. Can you paste the output of:

GET X/_msearch
{}
{"_source":["ts","@timestamp"],"size":1,"query":{"range":{"@timestamp":{"gt":"now-2h"}}},"sort":[{"ts":{"order":"asc"}}]}
{}
{"_source":["ts","@timestamp"],"size":1,"query":{"range":{"@timestamp":{"gt":"now-2h"}}},"sort":[{"ts":{"order":"desc"}}]}
{}
{"_source":["ts","@timestamp"],"size":1,"query":{"range":{"@timestamp":{"gt":"now-2h"}}},"sort":[{"@timestamp":{"order":"asc"}}]}
{}
{"_source":["ts","@timestamp"],"size":1,"query":{"range":{"@timestamp":{"gt":"now-2h"}}},"sort":[{"@timestamp":{"order":"desc"}}]}
"ts": {
            "type": "date"
          }

 "properties": {
          "@timestamp": {
            "type": "date"
          },

and for the query:

{
  "responses": [
    {
      "took": 471,
      "timed_out": false,
      "_shards": {
        "total": 13,
        "successful": 12,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 18194874,
        "max_score": null,
        "hits": [
          {
            "_index": "X",
            "_type": "logs",
            "_id": "v-Af6mwBFrPs1dZnQTpj",
            "_score": null,
            "_source": {
              "@timestamp": "2019-09-01T00:00:01.561Z",
              "ts": "2019-09-01T00:00:00,001"
            },
            "sort": [
              1567296000001
            ]
          }
        ]
      },
      "status": 200
    },
    {
      "took": 470,
      "timed_out": false,
      "_shards": {
        "total": 13,
        "successful": 12,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 18194874,
        "max_score": null,
        "hits": [
          {
            "_index": "X",
            "_type": "logs",
            "_id": "5d_S62wBfoJnQ0YMbO6D",
            "_score": null,
            "_source": {
              "@timestamp": "2019-09-01T07:48:58.298Z",
              "ts": "2019-09-01T07:48:57,793"
            },
            "sort": [
              1567324137793
            ]
          }
        ]
      },
      "status": 200
    },
    {
      "took": 136,
      "timed_out": false,
      "_shards": {
        "total": 13,
        "successful": 12,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 18194874,
        "max_score": null,
        "hits": [
          {
            "_index": "X",
            "_type": "logs",
            "_id": "vT8f6mwBfoJnQ0YMPSVJ",
            "_score": null,
            "_source": {
              "@timestamp": "2019-09-01T00:00:01.214Z",
              "ts": "2019-09-01T00:00:00,008"
            },
            "sort": [
              1567296001214
            ]
          }
        ]
      },
      "status": 200
    },
    {
      "took": 119,
      "timed_out": false,
      "_shards": {
        "total": 13,
        "successful": 12,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 18194874,
        "max_score": null,
        "hits": [
          {
            "_index": "X",
            "_type": "logs",
            "_id": "5d_S62wBfoJnQ0YMbO6D",
            "_score": null,
            "_source": {
              "@timestamp": "2019-09-01T07:48:58.298Z",
              "ts": "2019-09-01T07:48:57,793"
            },
            "sort": [
              1567324138298
            ]
          }
        ]
      },
      "status": 200
    }
  ]
}

i extended it to 24 hours since i had no docs from the last 2 hours.
seems like theres no data quality issue. if i could just subtract them it would work fine :pensive:

thanks again

I don't understand what's going on here :slightly_smiling_face:. If you run the avg aggregation on the last 24 hours of data, does it give you that unusual large number?

The only thing that looks off to me is the format of your ts dates. Those dates are not formatted like @timestamp, and are missing a timezone. A date like 2019-09-01T00:00:00,001 is not formatted in a way that Elasticsearch would accept by default.

whenever i try to do arithmetic operations i get an error.
only if i add .value.millis I can do the operations, but then ts value is 0 and thats why i get bad results.

you have an idea on how to format it correctly?

The problem is not on the script side, it is in your data:

PUT test/_doc/1
{
  "@timestamp": "2019-08-27T12:49:32.000Z",
  "ts": "2019-08-27T12:49:30.000Z"
}

I'm indexing a single document with the correct date format. And now I'm going to run the exact same aggregation, that you used:

GET test/_search
{
  "script_fields": {
    "test1": {
      "script": {
        "lang": "painless",
        "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000"
      }
    }
  },
  "aggs": {
    "average_delay": {
      "avg": {
        "script": {
          "source": "(doc['@timestamp'].value.millis - doc['ts'].value.millis) / 1000",
          "lang": "painless"
        }
      }
    }
  }
}

The output will be the following:

{
  "took" : 89,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "test1" : [
            2
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "average_delay" : {
      "value" : 2.0
    }
  }
}

As you can see the value from the document and from the bucket is the same. Now let's add a second document, with a 6 seconds difference:

PUT test/_doc/2
{
  "@timestamp": "2019-08-27T12:49:36.000Z",
  "ts": "2019-08-27T12:49:30.000Z"
}

I run the exact them search than above and got the following results, which again are the one we expect:

{
  "took" : 806,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "test1" : [
            2
          ]
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "test1" : [
            6
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "average_delay" : {
      "value" : 4.0
    }
  }
}

thanks for your help.
we established that the problem is the data. im trying to find a solution.
is there a way to treat it somehow? cast another type? do anything?

I don't really know your data ingestion process, but my recommendations will be to use an ingest pipeline to modify your date format.

The problem in this thread was: "Aggregate with a script yields different results" and it seems to be solved. I would recommend opening a new question for your new problem, which is now a date or data formatting issue.

@yavidor Are you sure your data is clean? Is it possible you have some documents that do not have the ts field?

doc['ts'].value gives 0
doc['@timestamp'].value still gives millis from epoch

This would happen in older version (before 7.0) of Elasticsearch if you are missing the ts field in a document. doc['ts'] is essentially a List of values, so you need to check doc['ts'].size().

that indeed was the problem.
i solved it by filtering on the non existing fields

GET X/_search
{
  "_source": ["ts", "@timestamp"],
  "size": 0, 
  "query": {
    "bool": {
      "filter": [{
        "range": {
          "@timestamp": {
            "gte": "now-1h"
          }
        }
        },
        {"exists": {
          "field": "ts"
        }
        }
      ]
    }
  },
    
      
  "aggs": {
    
    "avg_delay": {
      "avg": {
        "script": {
          "source": "(((doc['@timestamp'].value.millis - doc['ts'].value.millis)/1000))",
          "lang": "painless"
        }
      }
    }
  }
}

Thanks for everyone who tried to help out

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.