How to generate all permutations of child documents in Elasticsearcherch

Hi I have following schema of user objects:

curl -X PUT "localhost:9200/test_example" -H 'Content-Type: application/json' -d'
{
    "mappings": {
        "answer" : {
            "_parent" : {
                "type" : "user"
            }
        }
    }
}
'

curl -X PUT "localhost:9200/test_example/user/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index": {"_id": 1} }
{ "user": 1, "foo": 1}
{ "index": {"_id": 2} }
{ "user": 2, "foo": 2}
{ "index": {"_id": 3} }
{ "user": 3, "foo": 1}
{ "index": {"_id": 4} }
{ "user": 4, "foo": 1, "answers": {"5": [6,3], "6":[6], "7":[6]}, "potential_approach": true}
'

curl -X PUT "localhost:9200/test_example/answer/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index": { "parent": 1}}
{"question":5, "answer":6, "negative": false}
{ "index": { "parent": 1}}
{"question":5, "answer":3, "negative": false}
{ "index": { "parent": 1}}
{"question":6, "answer":6, "negative": false}
{ "index": { "parent": 1}}
{"question":7, "answer":6, "negative": false}
{ "index": { "parent": 2}}
{"question":5, "answer":6, "negative": false}
{ "index": { "parent": 2}}
{"question":5, "answer":3, "negative": false}
{ "index": { "parent": 2}}
{"question":6, "answer":1, "negative": false}
{ "index": { "parent": 2}}
{"question":7, "answer":2, "negative": false}
{ "index": { "parent": 3}}
{"question":5, "answer":6, "negative": false}
{ "index": { "parent": 3}}
{"question":5, "answer":2, "negative": false}
{ "index": { "parent": 3}}
{"question":6, "answer":4, "negative": false}
{ "index": { "parent": 3}}
{"question":7, "answer":3, "negative": false}
'

Now I want to create a count of all users split by their questions answers

So create a breakdown of users and all their question answers permutations:

{Q:1,A:1} and at same time {Q2, A:1}, then {Q:1,A:2} and at same time {Q2, A:1} ... and so on.

Conceptually something like this aggregation:

curl -X POST "localhost:9200/test_example/user/_search?pretty&size=1"
-H 'Content-Type: application/json' -d' {   "query": {},   "aggs": {
    "permutations": {
      "children": {
        "type": "answer"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "bool": {
              "should": [],
              "must_not": [],
              "must": [
                {
                  "term": {
                    "question": 5
                  }
                }
              ]
            }
          },
          "aggs": {
            "5": {
              "terms": {
                "field": "answer",
                "size": 500
              },
              "aggs": {
                "filtered": {
                  "filter": {
                    "bool": {
                      "should": [],
                      "must_not": [],
                      "must": [
                        {
                          "term": {
                            "question": 6
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "6": {
                      "terms": {
                        "field": "answer",
                        "size": 500
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }   } } '

I understand why this doesn't work, because one bucket excludes the documents that I would like to have in sub bucket, but I'm wondering if there are other approaches to the problem reverse nested aggregation could do this, but I have child documents instead of nested ones.

One approach that I could take would be maybe storing all the answers on user object in form that I added on user 4, then something like that would work fine:

curl -X POST "localhost:9200/test_example/user/_search?pretty&size=1" -H 'Content-Type: application/json' -d'
{
  "query": {},
  "aggs": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [],
          "must_not": [],
          "must": []
        }
      },
      "aggs": {
        "5": {
          "terms": {
            "field": "answers.5",
            "size": 500
          },
          "aggs": {
            "filtered": {
              "filter": {
                "bool": {
                  "should": [],
                  "must_not": [],
                  "must": []
                }
              },
              "aggs": {
                "6": {
                  "terms": {
                    "field": "answers.6",
                    "size": 500
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
'

Following query would generate exactly what I want, my main issue is that there are 2000 question ID's so I'm worried that for few million user rows, Elasticsearch might blow up for me (some users might have answered 500-700 distinct questions). Are there any approaches I could use to work with the schema where answers are child documents? Or maybe there is a way to check if ES can handle documents with potentially many keys in them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.