Combination of two top terms aggregations


#1

Hi,

When I perform a term aggregation on FIELD1 and a term aggregation on FIELD2, I got the following result:

FIELD1 : {
TOPTERM1FIELD1 : 10000,
TOPTERM2FIELD1 : 1000
}
and
FIELD2 : {
TOPTERM1FIELD2 : 121212,
TOPTERM2FIELD2 : 1212
}

If I combine the two aggregations with the help of the "sub-aggregation" mechanism, I get :

FIELD1 : {
TOPTERM1FIELD1 : {
FIELD2 : {
LOCALTOPTERM1FIELD2: 12
}
}
}

etc....
I would like to have all combinations of the n first terms for the two top terms:
For exemple :

TOPTERM1FIELD1 AND TOPTERM1FIELD2 : 12
TOPTERM1FIELD1 AND TOPTERM2FIELD2 : 4

I tried the composite aggregation to achieve this but it seems that it collects ordered terms and not most frequent terms first.
Is there a way to achieve this?

Thank you for your help,

Aurélien


#2

I tried to use the" order by count desc" available in classic term aggregation :
{ "size": 0, "aggs": { "test_composite": { "composite": { "sources": [ { "IPH": { "terms": { "field": "FIELD", "order": { "_count": "desc" } } } } ] } } } }
but it didn't work... :frowning:


#3

I have realized that my explanation wasn't very clear so I add a concrete example:

When I run the 2 aggregations separatly :

{
    "aggs" : {
    	"IPH": { "terms" : { "field": "IPH", "size" : 2} } ,
    	"PID": { "terms" : { "field": "PID" , "size" : 2} } 
     }
}

I got the following result:

{
	"aggregations": {
		"IPH": {
			"buckets": [{
					"key": "035/00",
					"doc_count": 722
				},
				{
					"key": "043/00",
					"doc_count": 559
				}
			]
		},
		"PID": {
			"buckets": [{
					"key": "960897",
					"doc_count": 170
				},
				{
					"key": "71628332",
					"doc_count": 147
				}
			]
		}
	}
}

When I try to combine them with a sub-aggregation :

{
	"aggs": {
		"IPH": {
			"terms": {
				"field": "IPH",
				"size": 2
			},
			"aggs": {
				"PID": {
					"terms": {
						"field": "PID",
						"size": 2
					}
				}
			}
		}
	}
}

I have the following result:

 {
 	"aggregations": {
 		"IPH": {
 			"buckets": [{
 					"key": "035/00",
 					"doc_count": 722,
 					"PID": {
 						"buckets": [{
 								"key": "906768",
 								"doc_count": 51
 							},
 							{
 								"key": "3975774",
 								"doc_count": 49
 							}
 						]
 					}
 				},
 				{
 					"key": "043/00",
 					"doc_count": 559,
 					"PID": {
 						"buckets": [{
 								"key": "3975774",
 								"doc_count": 49
 							},
 							{
 								"key": "640822",
 								"doc_count": 33
 							}
 						]
 					}
 				}
 			]
 		}
 	}
 }

I am trying to find a way to get

[{
	"IPH": "035/00",
	"PID": "960897",
	"count": 12
}, {
	"IPH": "035/00",
	"PID": "71628332",
	"count": 12
}, {
	"IPH": "043/00",
	"PID": "960897",
	"count": 12
}, {
	"IPH": "043/00",
	"PID": "71628332",
	"count": 12
}]

Does anybody have an idea how to achieve this?

Thank you,


(Mark Harwood) #4

So you essentially want to flatten the results (I don't know where the count of 12 comes from)?
You can use a script in a single terms aggregation to combine the PID and IPH values into a single string that acts as the key you are looking for.


#5

Hi,

Thank you for your answer. The count 12 is probably false, but basically, for

{
	"IPH": "035/00",
	"PID": "960897",
	"count": XX
}

XX should be the count of documents matching the query : (IPH:"035/00" AND PID:"960897")

Let's assume that the count 12 is correct.

Maybe I misunderstood your solution, but I think there is a little problem with it for my use case. If I am right, with the script it is possible to have something like :

{
 "key" : "032/00_898989",
  doc_count" : 231
} 

because the intersection between IPH:"032/00" and PID:"898989" contains more than 12 documents.
However, this is not what I want because, separately, "035/00" is not in the top 2 terms for IPH and "898989" is not in the top 2 terms for PID, event if "035/00_898989" is in the top 2 terms for my query.

Thank you for your feedback,

Regards,

Aurélien


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.