Trying to mimic a where clause


(sfoley@theemaillaundry.com) #1

Hi say I have two fields "catA" and "catB" respectively indexed. A has possible values [ "1" ,"2", "n" ] and B has possible values ["i" ,"ii" ,-"n"] within the document set. I want to be able to know for what values of catB , the count of distinct catA values is >1

any help would be graciously appreciated.

Thanks in advance


(Paul McMahon) #2

Hello mate....

What you are looking for will be a combination of Aggregations, like this (in pseudocode, I don't have an elastic instance handy sorry):

A terms aggregation on "catB" to 'group by' those values, with a sub-aggregation of type 'value_count' to count the 'catA' values within the enclosing catB.

Then, as a sibling agg to the 'terms' you'll want a 'bucket_selector' to filter the results...

So at a rough stab, I think it would be something like this... Apologies if it doesn't run, just doing it off the top of my head!

"aggs" : {
	"summary" : {
		"terms" : { "field" : "catB", "size" : "200"},
		"aggs": { 
			"catAcount" : { "value_count" : { "field" : "catA" } }
		}
	},
	"having" : "bucket_selector": {
					"buckets_path": {
					  "catAcounts": "summary>catAcount"
					},
					"script": "params.catAcounts > 1"
				}
}

(sfoley@theemaillaundry.com) #3

Thanks so much for the answer . Its almost there.. I can get this to work
"aggs" : {
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "cardinality" : { "field" : "catA" } }
}
}}}
Which changes the "value_count" to "cardinality" this will tell me the number of values for catA in each document that contains the CatB field. I could not get the bucket aggregator to work tho. Something to do with the brackets as I am getting errors about sibling queries not allowed etc. If anyone has anything to offer on getting the above working with the bucket aggregator so as to get a list of the catA values it would be great... either way thanks for the help. Im halfway there :slight_smile:


(Paul McMahon) #4

Hello mate -sorry my first attempt didn't work! Looking at it, perhaps I put the 'bucket_selector' in the wrong spot? What if it is shifted to be contained by the 'terms' structure - i.e. like this:

(I should note here, I haven't used the bucket selector before, but I do tend to use other pipeline aggs a fair bit, so I think it should work ok with the right tweak :slight_smile:

"aggs" : {
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "value_count" : { "field" : "catA" } },
"having" : "bucket_selector": {
"buckets_path": {
"catAcounts": "catAcount"
},
"script": "params.catAcounts > 1"
}
}
}
}


(sfoley@theemaillaundry.com) #5

"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Aggregation definition for [having starts with a [VALUE_STRING], expected a [START_OBJECT].",

Thanks for trying again, alas it doesnt work though. I have twiddled with it little myself but no luck so far.. thanks again.


(sfoley@theemaillaundry.com) #6

actually i got this error now :{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"params.catAcount > 1",
" ^---- HERE"
],
"script": "params.catAcount > 1",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
}
},
"status": 503
}


(sfoley@theemaillaundry.com) #7

hey the error was I my end. Thanks It worked Im marking this as solved . All the very best and thanks again for the help.


(Paul McMahon) #8

Ha ha sorry been out grabbing a coffee - you got it working though? Sorry if my syntax was off... Getting these aggs working right can be finicky - I usually seem to need a few goes to get it right :slight_smile:


(sfoley@theemaillaundry.com) #9

I know this query is marked complete but I don have an additional related question which I think would be misplaced in a new post. So here goes. If the final code is like this:
"summary" : {
"terms" : { "field" : "catB", "size" : "200"},
"aggs": {
"catAcount" : { "value_count" : { "field" : "catA" } },
"having" : "bucket_selector": {
"buckets_path": {
"catAcounts": "catAcount"
},
"script": "params.catAcounts > 1"
}
}
}
}

how do I do a sub aggregation on the buckets? as in where do I nest the sub aggregation ? I found the documentation particularly vague and nebulous on this matter... anyway if you have any idea it would be great. Also thanks a million for the help.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.