Aggregate combinations of nested documents

Using Elasticsearch, I would like to aggregate combinations of nested documents.

Take a hypothetical index of movie data with these mappings:

{
	mappings: {
		properties: {
			title: {
				type: 'keyword'
			},
			people: {
				type: 'nested',
				properties: {
					id: {
						type: 'keyword'
					},
					name: {
						type: 'keyword'
					},
					role: {
						type: 'keyword'
					}
				}
			}
		}
	}
}

And these docs:

{
	title: "Goodfellas",
	people: [
		{ id: '101', name: "Martin Scorsese", role: "Director" },
		{ id: '102', name: "Robert De Niro", role: "Actor" },
		{ id: '103', name: "Ray Liotta", role: "Actor" },
		{ id: '104', name: "Joe Pesci", role: "Actor" },
		{ id: '105', name: "Frank Vincent", role: "Actor" }
	]
},
{
	title: "Cape Fear",
	people: [
		{ id: '101', name: "Martin Scorsese", role: "Director" },
		{ id: '102', name: "Robert De Niro", role: "Actor" },
		{ id: '106', name: "Nick Nolte", role: "Actor" },
		{ id: '107', name: "Jessica Lange", role: "Actor" }
	]
},
{
	title: "Casino",
	people: [
		{ id: '101', name: "Martin Scorsese", role: "Director" },
		{ id: '102', name: "Robert De Niro", role: "Actor" },
		{ id: '108', name: "Sharon Stone", role: "Actor" },
		{ id: '104', name: "Joe Pesci", role: "Actor" },
		{ id: '105', name: "Frank Vincent", role: "Actor" }
	]
},
{
	title: "Heat",
	people: [
		{ id: '109', name: "Michael Mann", role: "Director" },
		{ id: '110', name: "Al Pacino", role: "Actor" },
		{ id: '102', name: "Robert De Niro", role: "Actor" },
		{ id: '111', name: "Val Kilmer", role: "Actor" }
	]
}
{
	title: "The Irishman",
	people: [
		{ id: '101', name: "Martin Scorsese", role: "Director" },
		{ id: '102', name: "Robert De Niro", role: "Actor" },
		{ id: '110', name: "Al Pacino", role: "Actor" },
		{ id: '104', name: "Joe Pesci", role: "Actor" }
	]
}

Is there a way of aggregating pairs of people without having a specific person as a fixed starting point? E.g.

  • Martin Scorsese and Robert De Niro: 4
  • Martin Scorsese and Joe Pesci: 3
  • Robert De Niro and Joe Pesci: 3
  • Robert De Niro and Al Pacino: 2
  • Martin Scorsese and Ray Liotta: 1

I would also like to:

Specify Director-Actor pairs only, e.g.

  • Martin Scorsese and Robert De Niro: 4
  • Martin Scorsese and Joe Pesci: 3
  • Martin Scorsese and Ray Liotta: 1
  • Martin Scorsese and Nick Nolte: 1
  • Michael Mann and Robert De Niro: 1

Increase the pairs to triples, quadruples, etc., e.g. triples:

  • Martin Scorsese and Robert De Niro and Joe Pesci: 3
  • Martin Scorsese and Robert De Niro and Frank Vincent: 2
  • Martin Scorsese and Robert De Niro and Ray Liotta: 1
  • Martin Scorsese and Ray Liotta and Joe Pesci: 2
  • Robert De Niro and Ray Liotta and Frank Vincent: 2

Include the derivation of the combinations (which would perhaps require a multi-level aggregation), e.g.

  • Martin Scorsese and Robert De Niro: 4 (Goodfellas, Cape Fear, Casino, The Irishman)
  • Martin Scorsese and Joe Pesci: 3 (Goodfellas, Casino, The Irishman)
  • Robert De Niro and Joe Pesci: 3 (Goodfellas, Casino, The Irishman)
  • Robert De Niro and Al Pacino: 2 (Heat, The Irishman)
  • Martin Scorsese and Ray Liotta: (Goodfellas)

The potential solutions I can think of are:

  • Calculate the pairs prior to indexing the document and include it as a property that can be used as the term on which to aggregate, e.g. a set of compoundId values that for Goodfellas would be: 101-102, 101-103, 101-104, 102-103, 102-104, 103-104 (though there would need to be some subsequent logic to acquire the corresponding names for the people represented by these IDs).
  • Write a Painless script that can calculate the pairs at query time, though given the numerous people combinations that each document could have and repeating that for a large amount of data (let's say ~1m documents) it's easily possible that such a query would struggle and not be practical for repeated usage in a live application.

Ideally I'd like to be able to produce these results using a single Elasticsearch aggregation, although appreciate this may not be possible.

What solutions are there to this problem?

Thanks in advance.