Hi Daniel, thanks for your interest in Elasticsearch!
If you just want one of the projects at a time, this is very straightforward - you just make a query sorted by the date:
GET testindex/_search
{
"query": {
"match": {
"project_id": "1"
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
If you want to retrieve the most recent document for all projects in one query, you can use Field Collapsing like so:
GET testindex/_search
{
"size": 10,
"query": {
"match_all": {}
},
"collapse": {
"field": "project_id",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{"timestamp": "desc"}]
}
}
}
You'll have to adjust the size parameter based on the number of projects you have, of course, and the response is a bit verbose:
{
//some fields elided for readability
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "MP0idGkBCYiTQNOubpiI",
"_score" : 1.0,
"_source" : {
"project_id" : "1",
"hours_remaining" : 10,
"timestamp" : "2019-03-10"
},
"fields" : {
"project_id" : [
"1"
]
},
"inner_hits" : {
"most_recent" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "Mv0idGkBCYiTQNOubpiI",
"_score" : null,
"_source" : {
"project_id" : "1",
"hours_remaining" : 5,
"timestamp" : "2019-03-14"
},
"sort" : [
1552521600000
]
}
]
}
}
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "Mf0idGkBCYiTQNOubpiI",
"_score" : 1.0,
"_source" : {
"project_id" : "2",
"hours_remaining" : 20,
"timestamp" : "2019-03-10"
},
"fields" : {
"project_id" : [
"2"
]
},
"inner_hits" : {
"most_recent" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "Mf0idGkBCYiTQNOubpiI",
"_score" : null,
"_source" : {
"project_id" : "2",
"hours_remaining" : 20,
"timestamp" : "2019-03-10"
},
"sort" : [
1552176000000
]
}
]
}
}
}
}
]
}
}
The thing to pay attention to is the inner_hits
field of each hit - the _source
directly inside each hit is just the first document encountered for each project_id
. So for example, for the first result, you'd look at the value of .hits.hits[0].inner_hits.most_recent.hits.hits[0]._source
, for the second, .hits.hits[1].inner_hits.most_recent.hits.hits[0]._source
, and so on.
Does that help get you what you need?
It's also worth noting that Elasticsearch does support a limited subset of SQL, although I don't think that it would be helpful in this case as it doesn't support the necessary GROUP BY functions (yet), but it may be helpful as you explore Elasticsearch.