I'm trying to understand the difference between these types of query, and when to use each.
I have documents that have a "unique_hash" value, and want to return the latest document for each unique hash value within a given query. The script that will consume this data will need to iterate over all matching documents. I have explored size/from and partitions respectively for the sake of reasonable response times.
As far as I can tell, I can achieve this with either:
{
"collapse": {
"field": "unique_hash.keyword",
"inner_hits": {
"name": "latest",
"size": 1,
"sort": [
"created_at"
]
}
},
"size": 100,
"from": 100
}
or:
{
"aggs": {
"unique_hashes": {
"terms": {
"field": "unique_hash.keyword",
"size": 100000,
"include": {
"partition": 1,
"num_partitions": 100
}
},
"aggs": {
"vulnerabilities": {
"top_hits": {
"sort": [
{
"created_at": "desc"
}
],
"size": 1
}
}
}
}
}
}
In terms of response time, the difference doesn't seem enormous, so I'm wondering which is the most appropriate for my use case?