Say I have an index with mapping:
PUT /item
{
"mappings": {
"_doc" : {
"properties" : {
"name": { "type" : "keyword" },
"supplierName": { "type" : "keyword" },
"comments" : {
"type" : "nested",
"properties" : {
"username" : { "type" : "keyword" },
"comment" : { "type" : "text" }
}
}
}
}
}
}
and I want to retrieve a specific amount of comments that are from a specific supplier and where the comment is made by a specific user.
A user can comment on an item as many times as they want and a supplier can have many items.
Example:
PUT /item/_doc/1?refresh
{
"name":"ItemOne",
"supplierName":"CoolSupplier",
"comments": [
{"username": "mark", "comment": "Cool item1"},
{"username": "mark", "comment": "Cool item2"},
{"username": "mark", "comment": "Cool item3"},
{"username": "mark", "comment": "Cool item4"},
{"username": "mark", "comment": "Cool item5"},
{"username": "mark", "comment": "Cool item6"},
{"username": "jake", "comment": "Bad item"},
{"username": "paul", "comment": "Great item"}
]
}
So say I want to retrieve a certain amount of comments with the name of the item for a specific supplier and user regardless if all the comments are on a single item or spread across multiple.
If I use nested inner_hits like this:
GET item/_search
{
"size": 4,
"_source": "name",
"query": {
"bool": {
"filter": [
{
"match": {
"supplierName": "CoolSupplier"
}
},
{
"nested": {
"path": "comments",
"query": {
"match": {
"comments.username": "mark"
}
},
"inner_hits": {
"size": 4
}
}
}
]
}
}
}
With this query up to four comments can be returned per parent document.
The thing is I only want 4 nested documents in total. Those four nested documents could come from the first found parent document, or one document from four different parent documents.
Is there a way to specify a total amount/maximum number of inner_hits to return regardless of parent doc?
Another alternative I found is the top_hits metric aggregation:
GET item/_search
{
"size": 0,
"aggs": {
"outerFilter": {
"filter": {
"match": {
"supplierName": "CoolSupplier"
}
},
"aggs": {
"commentAggs": {
"nested": {
"path": "comments"
},
"aggs": {
"commentsFilter": {
"filter": {
"match": {
"comments.username": "mark"
}
},
"aggs": {
"foundComments": {
"top_hits": {
"size": 4
}
}
}
}
}
}
}
}
}
}
This correctly returns a maximum of 4 comments regardless of parent docs. The only problem is I wish to also retrieve information from the parent document. Is there a way to do this?
If I wanted to retrieve say the name of the item the comment was made on, how would I do that? Would i need to perform the aggs query, then perform another query with all of the item ids in order to retrieve the name?
Or that if the parent document had a createDate timestamp and I wanted to order by that, is that possible using top_hits aggs? I haven't been able to figure it out.
Should the mapping maybe be a join datatype instead of nested?