How to get last X documents sorted by a timestamp

I have a huge set of email events of different types e.g. sent, delivered, complaint, bounced etc.

My goal is:

  • get the last 10k sent events sorted by a timestamp
  • find the timestamp of the earliest event in this subset
  • use terms aggregation to get the count of all event types starting from that time till now

I'm having troubles with the first bullet. Is there any way to achieve that with elastic search?

UP

Yes.

What did you try so far?

I've tired top hits aggregation but the size I need is too large, I can set the size param on the query itself but then I'll get 10k results in the output.

What I need, seems to be, an aggregation that will allow me to get only 10k first results and then another nested aggregation that will get only the first one of that set.

Ideally I would like to achieve all 3 steps in one query but if I can get an event timestamp in one query I know how to do the last aggregation in a separate query.

Why not just a normal search without aggs?

How can I instruct es to get 10k sorted events of specific type and then return only the first (or last) one?

filter by type of event. sort on the time stamp and set size to 1?

But I might be missing the problem.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

I can refrase: I have a sequence of events in desc order by timestamp and I want to know the timestamp of the event with index == 10k. If I reverse the order and get the first document, it will be the first event ever, but I need the event that happened 10k events ago.

[ first event ]------- a lot of events -------[ event that I need ]------- 10k events -------[ last event ]

Might traversing the results from back to front be an option?

What do you mean?

Basically do the final sort at the client. I do not think there is a way to do it in the original request.

Any ideas? I would really appreciate your help

We are talking about 10k results. Maybe I can use scripts somehow to increment a variable in search and cut off when it equals to 10k?

I think I've found the solution to my problem.

Elasticsearch default result window is 10k, it means that I can build a query like this:

GET events/_search
{
  "from": 9999,
  "size": 1,
  "sort": [
    {
      "event_timestamp": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "event_type": "_email.send"
          }
        }
      ]
    }
  }
}

It should sort all the events by event_timestamp, set the page size to 1 and get a 9999th page, so I suppose it should return the _email.send event that happened 9999 events ago.

@dadoonet Is it going to work as I think it should?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.