Get most frequent combinations of nested docs

I have a elasticsearch index with nested documents (colors). I would like to have a query with an aggregation, which shows the most frequent combinations of colors.

An example:

I have three documents

[
	{
		"name": "Document A",
		"colors": [
			{ name: "Red", slug: "red" },
			{ name: "Green", slug: "green" }
			{ name: "Blue", slug: "blue" }
		]
	},
	{
		"name": "Document B",
		"colors": [
			{ name: "Green", slug: "green" }
			{ name: "Blue", slug: "blue" }
		]
	},
	{
		"name": "Document C",
		"colors": [
			{ name: "Red", slug: "red" }
			{ name: "Blue", slug: "blue" }
		]
	}
]

I would like to get the result:

green-blue: doc count=2
red-blue: doc count=2
red-green: doc count=1
red-green-blue: doc count=1

And I would like to be able to filter, how many parts the combination should have e.g. combinations of min 2 and max 5 colors. The order does not play a role. red-green is same like green-red.

My mapping looks like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "created": {
          "type": "date"
        },
        "name": {
          "type": "text"
        },
        "colors": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "slug": {
              "type": "keyword"
            }
          }
        },
      }
    }
  }
}

What is the easiest way of doing this? I hope I don`t have to save all possible combinations while indexing. It are 4000 colors, so this would blow up everything.

What is the most efficient way to reach the goal to easily get the most frequent combinations of nested doc slugs?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.