Trying to mimic a where clause

boldfish · July 19, 2017, 1:31pm

Hi say I have two fields "catA" and "catB" respectively indexed. A has possible values [ "1" ,"2", "n" ] and B has possible values ["i" ,"ii" ,-"n"] within the document set. I want to be able to know for what values of catB , the count of distinct catA values is >1

any help would be graciously appreciated.

Thanks in advance

PaulieMac · July 19, 2017, 2:12pm

Hello mate....

What you are looking for will be a combination of Aggregations, like this (in pseudocode, I don't have an elastic instance handy sorry):

A terms aggregation on "catB" to 'group by' those values, with a sub-aggregation of type 'value_count' to count the 'catA' values within the enclosing catB.

Then, as a sibling agg to the 'terms' you'll want a 'bucket_selector' to filter the results...

So at a rough stab, I think it would be something like this... Apologies if it doesn't run, just doing it off the top of my head!

"aggs" : {
	"summary" : {
		"terms" : { "field" : "catB", "size" : "200"},
		"aggs": { 
			"catAcount" : { "value_count" : { "field" : "catA" } }
		}
	},
	"having" : "bucket_selector": {
					"buckets_path": {
					  "catAcounts": "summary>catAcount"
					},
					"script": "params.catAcounts > 1"
				}
}

boldfish · July 20, 2017, 9:11am

Thanks so much for the answer . Its almost there.. I can get this to work
"aggs" : {
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "cardinality" : { "field" : "catA" } }
}
}}}
Which changes the "value_count" to "cardinality" this will tell me the number of values for catA in each document that contains the CatB field. I could not get the bucket aggregator to work tho. Something to do with the brackets as I am getting errors about sibling queries not allowed etc. If anyone has anything to offer on getting the above working with the bucket aggregator so as to get a list of the catA values it would be great... either way thanks for the help. Im halfway there

PaulieMac · July 20, 2017, 9:22am

Hello mate -sorry my first attempt didn't work! Looking at it, perhaps I put the 'bucket_selector' in the wrong spot? What if it is shifted to be contained by the 'terms' structure - i.e. like this:

(I should note here, I haven't used the bucket selector before, but I do tend to use other pipeline aggs a fair bit, so I think it should work ok with the right tweak

"aggs" : {
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "value_count" : { "field" : "catA" } },
"having" : "bucket_selector": {
"buckets_path": {
"catAcounts": "catAcount"
},
"script": "params.catAcounts > 1"
}
}
}
}

boldfish · July 20, 2017, 9:43am

"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Aggregation definition for [having starts with a [VALUE_STRING], expected a [START_OBJECT].",

Thanks for trying again, alas it doesnt work though. I have twiddled with it little myself but no luck so far.. thanks again.

boldfish · July 20, 2017, 9:48am

actually i got this error now :{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"params.catAcount > 1",
" ^---- HERE"
],
"script": "params.catAcount > 1",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
}
},
"status": 503
}

boldfish · July 20, 2017, 10:05am

hey the error was I my end. Thanks It worked Im marking this as solved . All the very best and thanks again for the help.

PaulieMac · July 20, 2017, 11:21am

Ha ha sorry been out grabbing a coffee - you got it working though? Sorry if my syntax was off... Getting these aggs working right can be finicky - I usually seem to need a few goes to get it right

boldfish · July 24, 2017, 10:12am

I know this query is marked complete but I don have an additional related question which I think would be misplaced in a new post. So here goes. If the final code is like this:
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "value_count" : { "field" : "catA" } },
"having" : "bucket_selector": {
"buckets_path": {
"catAcounts": "catAcount"
},
"script": "params.catAcounts > 1"
}
}
}
}

how do I do a sub aggregation on the buckets? as in where do I nest the sub aggregation ? I found the documentation particularly vague and nebulous on this matter... anyway if you have any idea it would be great. Also thanks a million for the help.

system · August 21, 2017, 10:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bucket selector in sub aggregation or cardinality aggregation Elasticsearch	6	2249	August 28, 2019
Counting the number of buckets matching certain criteria Elasticsearch	3	8492	July 5, 2017
Help with aggregations Elasticsearch	4	1863	December 25, 2017
Filtering by a aggregation (like SQL having clause) Elasticsearch	4	6052	November 5, 2018
Filters Aggregation Bucket Selector Elasticsearch	1	588	July 26, 2019

Trying to mimic a where clause

Related topics