Greetings.
I need to fetch the oldest or newest document saved into a beat. Someone asked this before and this was the suggested course of action:
Hello, is there a solution to get this information?
This approach will NOT work for metricbeat data because you cannot snag over 10000 records at a time and metricbeat produces almost 9000 in one day.
Any suggestions? Thank you!
One of the indices I'm looking at has 57,150,950 documents. That's just one of them. : /
spinscale
(Alexander Reelsen)
December 18, 2019, 8:54am
3
What is the problem of sorting by timestamp and only return a single document? Execute two searches, one with sort asc and the other with sort desc.
Would that work or is there an issue with that approach that I might have missed in your post?
There are way too many documents. I'm querying with Python - which has a maximum document size of 10k. With metricbeat data I can go back about a day but that's it.
spinscale
(Alexander Reelsen)
December 19, 2019, 9:06am
5
we are now talking about something different, it seems. Your initial ask was to retrieve only a single document from what I read.
If you need to get more than 10k documents, than use a scroll search.
Okay... I need to get the oldest of 15 million records. I don't think I can do it.
For instance, if I do this query, I can go back a few days. But that's all.
resr = esr.search(index="metricbeat-*", size=documentSize, body={
"query": {
'bool': {
'must': [
{"term": {'fields': thisId}},
{"exists": {"field": "host"}}
]
}
},
"sort": [
{
"@timestamp": {
"order": "desc"
}
}
]
})
return resr['hits']['hits'][0]['_source']
I'll look into the scroll functionality and see if I can get it to work. Thank you for the suggestion.
In order to get around this problem I am changing my approach. Of course it created a new issue... But here's the follow up question:
Hi there.
I'm trying to aggregate host configuration data out of metricbeat. I need to look at the documents for a single day, bucketing by ID, and return the host info. What I'm trying to do is watch for configuration changes on a daily basis.
If I use this code:
"query": {
'bool': {
'must': [
{"exists": {"field": "host"}},
{'term': {'@timestamp': date}}
]
}
},
"aggs": {
"by_id": {
"terms": {"field": "fields.id"}
}
}
I get an array out containing our fields.id …
Thank you!
system
(system)
Closed
January 17, 2020, 8:57pm
9
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.