Get oldest / newest document in *beat

Greetings.

I need to fetch the oldest or newest document saved into a beat. Someone asked this before and this was the suggested course of action:

This approach will NOT work for metricbeat data because you cannot snag over 10000 records at a time and metricbeat produces almost 9000 in one day.

Any suggestions? Thank you!

One of the indices I'm looking at has 57,150,950 documents. That's just one of them. : /

What is the problem of sorting by timestamp and only return a single document? Execute two searches, one with sort asc and the other with sort desc.

Would that work or is there an issue with that approach that I might have missed in your post?

There are way too many documents. I'm querying with Python - which has a maximum document size of 10k. With metricbeat data I can go back about a day but that's it.

we are now talking about something different, it seems. Your initial ask was to retrieve only a single document from what I read.

If you need to get more than 10k documents, than use a scroll search.

Okay... I need to get the oldest of 15 million records. I don't think I can do it.

For instance, if I do this query, I can go back a few days. But that's all.

resr = esr.search(index="metricbeat-*", size=documentSize, body={
		"query": {
			'bool': {
				'must': [
					{"term": {'fields': thisId}},
					{"exists": {"field": "host"}}
				]
			}
		},
		"sort": [
			{
				"@timestamp": {
					"order": "desc"
				}
			}
		]
	})
return resr['hits']['hits'][0]['_source']

I'll look into the scroll functionality and see if I can get it to work. Thank you for the suggestion.

In order to get around this problem I am changing my approach. Of course it created a new issue... But here's the follow up question:

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.