What's a good strategy for getting one or as many document per group depending on the group

Hello, I'm new to Elasticsearch, so I apologize if this question is dumb.

Say if I'm storing book pages, I want to be able to search for one page per book with the highest relevance.

My first thought is to have a type Book and Page in my index, where Book would be Page's parent in the mapping. But how do I query for n top matching pages but restrict my result to only one page per book is allowed?

Now theoretically if there are some kind of books that would allow the above query to return more than one page from that book, how should that be implemented? I think a field is needed in the Book type to indicate if multiple pages are allowed to return, but I can't wrap my head around how to formulate a query to get that result.

If you don’t search for books but for pages, just index pages.

And put in the page object all information coming from the book, like title, author, isbn...

1 Like

So basically denormalize the data structure?

Even when doing so, how do I tackle the second situation where some books allows multiple pages to be returned while some only one.

Yes. Denormalize.

You can may be do that by using aggregations on field “single-page” then a top hits agg under with a size of 1???

Not sure I have a smarter idea. But someone else can also share his thoughts :slight_smile:

Would you mind helping out a bit more? I tried doing an aggregation and failed to see how to do a top hits with a dynamic size based on if it's a single-page

I have some test documents setup
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4MYCYlk1DjGsyQCzP",
"_score": 1,
"_source": {
"group": "A",
"allow_multiple": true,
"name": "AB"
}
},
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4MZJVlk1DjGsyQCzQ",
"_score": 1,
"_source": {
"group": "A",
"allow_multiple": true,
"name": "AC"
}
},
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4Ma0Klk1DjGsyQCzR",
"_score": 1,
"_source": {
"group": "B",
"allow_multiple": false,
"name": "BA"
}
},
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4McCNlk1DjGsyQCzT",
"_score": 1,
"_source": {
"group": "B",
"allow_multiple": false,
"name": "BC"
}
},
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4Mba0lk1DjGsyQCzS",
"_score": 1,
"_source": {
"group": "B",
"allow_multiple": false,
"name": "BB"
}
},
{
"_index": "new_test",
"_type": "new_test",
"_id": "AV-4MW3qlk1DjGsyQCzO",
"_score": 1,
"_source": {
"group": "A",
"allow_multiple": true,
"name": "AA"
}
}

"group" is say the book name, and "name" is just some data

This query is as far as I'm able to get:
GET /new_test/new_test/_search
{
"aggs":{
"group":{
"terms":{"field":"group"},
"aggs":{
"allow_multiple":{
"terms":{"field":"allow_multiple"}
}
}
}
}
}

Basically I want all 3 from group A to return, and only 1 from group B to show.

Any help would be greatly appreciated.

Top hits doc is: Top hits aggregation | Elasticsearch Guide [8.11] | Elastic

If you can't make it work, could you provide a full recreation script as described in

It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.