ESQL beginner grouping question

Hey,

I am currently trying to convert a lengthy query DSL query with terms, top hits and max aggregations into a ESQL query, but I am failing at a basic requirement. Imagine the following three documents:

PUT alr-test/_doc/1
{
    "@timestamp" : "2024-10-19T23:32:42.000Z",
    "day" : "2024-10-19T00:00:00.000Z",
    "minute_of_day" : 1412,
    "power" : 801
}

PUT alr-test/_doc/2
{
    "@timestamp" : "2024-10-19T21:32:42.000Z",
    "day" : "2024-10-19T00:00:00.000Z",
    "minute_of_day" : 1292,
    "power" : 1400
}

PUT alr-test/_doc/3
{
    "@timestamp" : "2023-06-24T20:35:47.000Z",
    "day" : "2023-06-24T00:00:00.000Z",
    "minute_of_day" : 1235,
    "power" : 18
}

What I would like to do: For each day, retrieve the latest @timestamp and power and day document. So for the 19th of October this would be 2024-10-19T23:32:42.000Z/801/2024-10-19T00:00:00.000Z and for 24th of june there is only one doc. I played around with

FROM alr-test
| STATS MAX(minute_of_day), TOP(@timestamp, 1, "desc") BY day,power

But this does not work, the moment I add more than one TOP function or all the fields in the BY statement.

I assume I am missing something absolutely trivial, that I don’t see after playing around. Thought about using VALUES() but that returns too much data. In SQL I would probably be joining with another CTA that only contains the latest timestamps for each day, which I cannot do here.

Thanks for any pointers!

–Alex

Hey Alex,

Just to clarify you say cannot do here because you don’t want to create an additional index or something else?

I’m asking because if you create an additional index with latest_timestamp value the following query will work.

FROM alr-test 
| LOOKUP join max_timestamps_index ON day 
| WHERE @timestamp == max_ts 
| DROP max_ts 
| SORT day ASC

ESQL-TOP vs DSL-top_hits

As far as I research because top_hits and TOP works differently it’s not possible to convert this DSL query to ESQL.

ESQL's TOP() function operates at the field level - it returns the top N values for each field independently. The top_hits aggregation operates at the document level - it returns the top N complete documents that match your criteria.

  • top_hits: Returns complete documents → preserves field relationships

  • TOP(): Returns individual field values → loses field relationships

Regards, Musab

Hey,

yes, with cannot do here I was more referring to being too much work for a relatively small index with a couple hundred thousand items. Then it would probably make more sense to have a transform that creates the right data without any additional query.

Your other remarks match with what I found, just way better written than I did - in the hopes that there is some functionality that I have been missing :slight_smile:

Thanks for your answer!

–Alex

Hey,

how about this?

FROM alr-test
| STATS MAX(minute_of_day), VALUES(@timestamp), VALUES(power) BY day
| RENAME `VALUES(@timestamp)` AS timestamps, `VALUES(power)` AS powers
| EVAL first_ts = MV_FIRST(timestamps), first_power = MV_FIRST(powers)
| KEEP first_ts, first_power

I am not sure of the order of the arrays is guaranteed, otherwise this would not work though.

–Alex

Hey,

Thanks for the reply. It looks like the order is not guaranteed because it’s sorted by first indexed _id. Even changing the order and not changing the _id ended up with wrong results. So we will see the first indexed document power value on response regardless of anything :slight_smile:

Note: Honestly, I’m surprised that ESQL doesn’t have this capability - or perhaps we just couldn’t find it. In aggregations + top_hits is one of the most frequently used queries, and many of my customers rely on it.