Uniqueness in filter aggregations

Lucass · April 26, 2022, 11:20am

Hi,

kindly ask you ideas to solve following: imagine having users filling personal surveys, eg. filling what they usually smoke. Each user can decide one or more possibilities from the checklist survey. On results, I am doing filter aggregations as follows:

POST users/_search
{
  "aggs": {        
     "e0b5ca053d9a61a415c43586cf07cccb7f9793a7": {
        "filters": {
            "filters": {
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigarettes": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.cigarettes": true
                            }
                        }
                    }
                },
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigars": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.cigars": true
                            }
                        }
                    }
                },
                "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_pipe": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.pipe": true
                            }
                        }
                    }
                },
               "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_electroniccigarettes": {
                    "nested": {
                        "path": "surveys",
                        "query": {
                            "term": {
                                "surveys.26.answers.whatExactlyDoYouSmoke.electroniccigarettes": true
                            }
                        }
                    }
                }
            },
            "other_bucket_key": "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::_others"
           
        }
    }
  }
}

Imagine user who smokes cigarettes and cigars - as a result, he is included in two aggregations.

What I need to achieve is a kind of uniqueness (userIDs) over all aggregations, meaning although user checks two options (cigarettes and cigars), once he falls into one aggregation (eg. cigars), I need to exclude him from the other (cigarettes), see the result:

"aggregations" : {
    "e0b5ca053d9a61a415c43586cf07cccb7f9793a7" : {
      "meta" : { },
      "buckets" : {
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigarettes" : {
          "doc_count" : 1
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_cigars" : {
          "doc_count" : 1 // SAME USER AS IN PREVIOUS AGG. REQUIRED TO BE REMOVED
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_electroniccigarettes" : {
          "doc_count" : 0
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::surveys.26.answers.whatExactlyDoYouSmoke_has_pipe" : {
          "doc_count" : 0
        },
        "e0b5ca053d9a61a415c43586cf07cccb7f9793a7::_other" : {
          "doc_count" : 4
        }
      }
    }
  }

It is important to achieve that in one request, not via splitting and adding exclude of userIDs (buckets) from previous aggregations. I also need to maintain other_buckets key.

Any ideas? Scripting? Pipelines? Construct negations in each aggregation built of previous ones?

I am a newbie so I would welcome answers for dummies;) Thx.

spinscale · April 27, 2022, 8:28am

Hey,

sounds like the cardinality aggregation could help you in this case.

--Alex

Lucass · April 27, 2022, 10:12am

Thx for the reponse. Cardinality was something I tried already - but possible did not understand the usage in this specific case when mapping of the fields split into each aggregation looks as follows:

...
"whatExactlyDoYouSmoke":{
   "properties":{
      "cigarettes":{
         "type":"boolean"
      },
      "cigars":{
         "type":"boolean"
      },
      "electronicCigarettes":{
         "type":"boolean"
      },
      "pipe":{
         "type":"boolean"
      }
   }
},
...

Could you point me out a little where exactly I should put that cardinality rule? And should I use scripting to precompute the value for cardinality according the example here Cardinality aggregation | Elasticsearch Guide [8.1] | Elastic, or anyhow directly?

system · May 25, 2022, 10:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filters aggregation has unexpected effects on cardinality vs filtered query Elasticsearch	1	542	July 11, 2017
How to know the unique count using an aggregated filter? Kibana	7	432	January 6, 2020
Nested cardinality values way off with filter? Elasticsearch	3	1815	July 6, 2017
Elasticsearch aggregation not matching with unique count metrics Elasticsearch	4	1693	December 29, 2017
Exclude array items in cardinality aggregation Elasticsearch	3	1181	June 12, 2020

Uniqueness in filter aggregations

Related topics