Uniqueness in filter aggregations

Hi,

kindly ask you ideas to solve following: imagine having users filling personal surveys, eg. filling what they usually smoke. Each user can decide one or more possibilities from the checklist survey. On results, I am doing filter aggregations as follows:

POST users/_search
{
  "aggs": {        
     "e0b5ca053d9a61a415c43586cf07cccb7f9793a7": {
        "filters": {
            "filters": {
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigarettes": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.cigarettes": true
                            }
                        }
                    }
                },
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigars": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.cigars": true
                            }
                        }
                    }
                },
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_pipe": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.pipe": true
                            }
                        }
                    }
                },
               "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_electroniccigarettes": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.electroniccigarettes": true
                            }
                        }
                    }
                }
            },
            "other_bucket_key": "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::_others"
           
        }
    }
  }
}

Imagine user who smokes cigarettes and cigars - as a result, he is included in two aggregations.

What I need to achieve is a kind of uniqueness (userIDs) over all aggregations, meaning although user checks two options (cigarettes and cigars), once he falls into one aggregation (eg. cigars), I need to exclude him from the other (cigarettes), see the result:

"aggregations" : {
    "e0b5ca053d9a61a415c43586cf07cccb7f9793a7" : {
      "meta" : { },
      "buckets" : {
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigarettes" : {
          "doc_count" : 1
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigars" : {
          "doc_count" : 1 // SAME USER AS IN PREVIOUS AGG. REQUIRED TO BE REMOVED
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_electroniccigarettes" : {
          "doc_count" : 0
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_pipe" : {
          "doc_count" : 0
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::_other" : {
          "doc_count" : 4
        }
      }
    }
  }

It is important to achieve that in one request, not via splitting and adding exclude of userIDs (buckets) from previous aggregations. I also need to maintain other_buckets key.

Any ideas? Scripting? Pipelines? Construct negations in each aggregation built of previous ones?

I am a newbie so I would welcome answers for dummies;) Thx.

Hey,

sounds like the cardinality aggregation could help you in this case.

--Alex

Thx for the reponse. Cardinality was something I tried already - but possible did not understand the usage in this specific case when mapping of the fields split into each aggregation looks as follows:

...
"whatExactlyDoYouSmoke":{
   "properties":{
      "cigarettes":{
         "type":"boolean"
      },
      "cigars":{
         "type":"boolean"
      },
      "electronicCigarettes":{
         "type":"boolean"
      },
      "pipe":{
         "type":"boolean"
      }
   }
},
...

Could you point me out a little where exactly I should put that cardinality rule? And should I use scripting to precompute the value for cardinality according the example here Cardinality aggregation | Elasticsearch Guide [8.1] | Elastic, or anyhow directly?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.