Get most frequent combinations of nested docs

JsRg · August 22, 2023, 4:41pm

I have a elasticsearch index with nested documents (colors). I would like to have a query with an aggregation, which shows the most frequent combinations of colors.

An example:

I have three documents

[
	{
		"name": "Document A",
		"colors": [
			{ name: "Red", slug: "red" },
			{ name: "Green", slug: "green" }
			{ name: "Blue", slug: "blue" }
		]
	},
	{
		"name": "Document B",
		"colors": [
			{ name: "Green", slug: "green" }
			{ name: "Blue", slug: "blue" }
		]
	},
	{
		"name": "Document C",
		"colors": [
			{ name: "Red", slug: "red" }
			{ name: "Blue", slug: "blue" }
		]
	}
]

I would like to get the result:

green-blue: doc count=2
red-blue: doc count=2
red-green: doc count=1
red-green-blue: doc count=1

And I would like to be able to filter, how many parts the combination should have e.g. combinations of min 2 and max 5 colors. The order does not play a role. red-green is same like green-red.

My mapping looks like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "created": {
          "type": "date"
        },
        "name": {
          "type": "text"
        },
        "colors": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "slug": {
              "type": "keyword"
            }
          }
        },
      }
    }
  }
}

What is the easiest way of doing this? I hope I don`t have to save all possible combinations while indexing. It are 4000 colors, so this would blow up everything.

What is the most efficient way to reach the goal to easily get the most frequent combinations of nested doc slugs?

system · September 19, 2023, 4:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregations: How to get number of combinations Elasticsearch	1	163	June 13, 2023
Aggregate combinations of nested documents Elasticsearch painless	1	340	August 12, 2021
Aggregate doc_count per document and not per nested item? Elasticsearch	2	390	February 6, 2020
How to merge two nested fields Elasticsearch	1	554	June 3, 2019
Nested aggregation with min_doc_count=0 Elasticsearch	2	1727	September 25, 2018

Get most frequent combinations of nested docs

Related topics