Aggregation pagination and sort by document field

Hi,
I would like to know how to sort an aggregation result by a field in the document.
My indice configuration

{
  "id": {
    "type": "long"
  },
  "film_id": {
    "type": "long"
  },
  "film_group_id": {
    "type": "long"
  },
  "created": {
    "type": "date"
  },
  "metadata": {
    "properties": {
      "production": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "duration": {
        "type": "long"
      }
    }
  }
}

What's important to know here, is that the metadata content is the same for every documents with the same film_group_id.

What I'm trying to do, is to get only one document by film_group_id (because the metadata is the same for every documents), sort the results by whatever field in metadata (let's say the name), and do a pagination.

So far this is what I have

{
    "size": 0,
	"query": {
		"range": {
			"created": {
				"gt": "2020-06-01"
			}
		}
	},
	"aggs": {
		"docs": {
			"terms": {
				"field": "film_group_id",
				"size": page*page_size
			},
			"aggs": {
				"fields": {
					"top_hits": {
						"size": 1
					}
				},
				"pagination": {
					"bucket_sort": {
						"size": page_size,
						"from": (page_size-1)*page
					}
				}
			}
		}
	}
}

This query works, I have my pagination, but I couldn't find a way to sort on a field in metadata like metadata.name. I tried adding order in the terms aggregation and sort in the bucket_sort aggregation, but none works.

Is there a way to achieve this or should I rework my indice mapping ?
Thanks.

1 Like

You can sort the terms agg by the value of a sub-agg but there isn't an agg that'll get you the unique name. If you do the terms aggregation on the name it'll be sorted by the name by default. You could get the group_id from the top_hits, I think.

You'd probably be better off paginating using the composite agg. It'll support deep pagination. I'd do the pagination of the name field in the object.

What I need is the metadata for each film_group_id, I don't really need the film_group_id itself. But I need the metadata only once per film_group_id so that's why my aggs is on film_group_id and I need only 1 document in the top_hits (as all documents would have the same metadata). The current query works and gives me what I want. The only problem is that I can't sort the result on a field in my document.
I have updated my index mapping a bit.
Your solution doesn't fit my needs because I can't agg on a meta field as many film_group_id may have the same value for one field (here production) but not the other meta (duration). So how could I get the same results as my current query, and add a sort on the duration field?
About the composite agg, I didn't find a way to choose the previous page as I didn't find a before_key or from param. So I don't think I could use this as well.

You can't really sort the terms agg on a unique value from another field. Maybe you could have the terms agg take its value from a script that returns the concatenation of the production meta and the id.

Tree composite aggs pagination is documented here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_pagination

It really is much better for pagination. It'll be able to skip collecting quite a lot of hits. The performance will be wildly better for deep pagination.

I think instead of the script I mentioned you might be better off aggregating on the meta field then, inside that, aggregating on the film_group_id.

I tried something different, I do have the sort on a specific field, but now I don't have the pagination. Here's the new query:

{
  "aggs": {
    "sort_field": {
      "terms": {
        "field": "metadata.duration" // the field on which I want to sort
      },
      "aggs": {
        "by_film_group_id": {
          "terms": {
            "field": "film_group_id" // keep the agg on film_group_id
          },
          "aggs": {
            "fields": {
              "top_hits": {
                "size": 1 //still got 1 document by film_group_id which is what I want
              }
            }
          }
        }
      }
    }
  }
}

I indeed have the sort, but the performance is worst than before.

I read the documentation about composite aggs, and I understand that it will be better in performance, but I still don't understand how to get a previous page. There's only an after_key, but not a previous_key, and I would need that.

That's exactly what I tried! :smile: :laughing: :rofl: :innocent:

Nice! I'd look to adapting that into composite to get the pagination.

Could you give me a hint on how to get a previous page with composite agg?
Moreover, how could achieve a page change from page 1 to page 5, then a change to page 7, and finally a change to page 3? I don't think that's possible with the composite agg, isn't it?

Finally, I tried with a new index mapping.

{
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "metadata": {
        "properties": {
          "film_group_id": {
            "type": "long"
          },
          "production": {
            "type": "text"
          },
          "duration": {
            "type": "long"
          },
          "created": {
            "type": "date"
          }
        }
      },
      "film_group": {
        "type": "join",
        "relations": {
          "group": "film"
        }
      }
    }
  }
}

So when inserting the a document with a new film_group_id, I first insert a parent with this new film_group_id like so:

{
  "id": 1,
  "metadata": {
    "film_group_id": 1
    //other meta fields
  },
  "film_group": {
    "name": "group"
  }
}

And when inserting a new doc with existing film_group_id:

{
  "id": 7,
  "metadata": null,
  "film_group": {
    "name": "film",
    "parent": 1 //parent id which is also film_group_id
  }
}

Notice the metadata null value as the metadata is set on the parent insertion
Finally the query with the sort and the pagination:

{
  "sort": {
    "metadata.duration": "asc" //sort on a meta field
  },
  "from": 0, //pagination
  "size": 1, //pagination
  "query": {
    "bool": {
      "must": [
        {
          "has_child": {
            "type": "film",
            "query": {
              "match_all" : {}
            }
          }
        }
      ]
    }
  }
}

As only the parent documents have the metadata, I only need to look for the parent documents without any aggregation. I don't know how if the performance will follow along but for now, this solve my needs.
Anyway, thanks a lot for your help :smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.