Multi Field Aggregation

Alastair_James · October 16, 2014, 12:57pm

Hi there.

I am trying to create an aggregation that mimics the following SQL query:

SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2
ORDER BY SUM(metric) DESC

On the face of it, I could create an terms aggregation for col1, add a
terms aggregation for col2 inside it, and the metric aggregations inside
that. I could then dynamically build the SQL result like grid and sort it
myself. However this breaks down for large results set, or a paginated
result set of a larger result.

The problem is that the ES aggregation system always returns the top N
results for each parent and child bucket. Thus for each value of col1 I
have N values of col2.

What I really want is to consider all possible combinations of col1 and
col2 in the same way as SQL does it and return the top N based on some
other metric. E.g. in ES speak, a single aggregation where the keys are
tuples of (col1, col2).

I suppose one way would be to use a script terms aggregation to concatenate
each value of col1 and col2, however thats going to be slow.

Does anyone else have any ideas?

Ideally there would be a tuple aggregation built in, e.g.:

"my_agg":{
"tuple":{
"fields":["col1","col2"]
}
}

Would product keys that are objects like:

{
"col1":"value1",
"col2":"value2"
}

Does anyone know if this would be possible to write as a plugin?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ec80e84-17b0-435d-a02e-f56a5d49f733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Artur_Martins · October 17, 2014, 11:44pm

Hello,

I'm having the exact same problem.
Have you managed to find a solution?

My thread is here: LINK
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Oum03VSBzHQ

Thanks

On Thursday, October 16, 2014 1:57:35 PM UTC+1, Alastair James wrote:

Hi there.

I am trying to create an aggregation that mimics the following SQL query:

SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2
ORDER BY SUM(metric) DESC

On the face of it, I could create an terms aggregation for col1, add a
terms aggregation for col2 inside it, and the metric aggregations inside
that. I could then dynamically build the SQL result like grid and sort it
myself. However this breaks down for large results set, or a paginated
result set of a larger result.

The problem is that the ES aggregation system always returns the top N
results for each parent and child bucket. Thus for each value of col1 I
have N values of col2.

What I really want is to consider all possible combinations of col1 and
col2 in the same way as SQL does it and return the top N based on some
other metric. E.g. in ES speak, a single aggregation where the keys are
tuples of (col1, col2).

I suppose one way would be to use a script terms aggregation to
concatenate each value of col1 and col2, however thats going to be slow.

Does anyone else have any ideas?

Ideally there would be a tuple aggregation built in, e.g.:

"my_agg":{
"tuple":{
"fields":["col1","col2"]
}
}

Would product keys that are objects like:

{
"col1":"value1",
"col2":"value2"
}

Does anyone know if this would be possible to write as a plugin?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/713127be-b89e-42ee-8811-18dd0e31d16a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alastair_James · October 19, 2014, 3:25pm

Hi there.

No, no solution yet. I suspect its not possible in ES as it stands. It may
require a custom aggregation plugin to be written, or to use scripting.

On Saturday, 18 October 2014 00:44:48 UTC+1, Artur Martins wrote:

Hello,

I'm having the exact same problem.
Have you managed to find a solution?

My thread is here: LINK
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Oum03VSBzHQ

Thanks

On Thursday, October 16, 2014 1:57:35 PM UTC+1, Alastair James wrote:

Hi there.

I am trying to create an aggregation that mimics the following SQL query:

SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2
ORDER BY SUM(metric) DESC

On the face of it, I could create an terms aggregation for col1, add a
terms aggregation for col2 inside it, and the metric aggregations inside
that. I could then dynamically build the SQL result like grid and sort it
myself. However this breaks down for large results set, or a paginated
result set of a larger result.

The problem is that the ES aggregation system always returns the top N
results for each parent and child bucket. Thus for each value of col1 I
have N values of col2.

What I really want is to consider all possible combinations of col1 and
col2 in the same way as SQL does it and return the top N based on some
other metric. E.g. in ES speak, a single aggregation where the keys are
tuples of (col1, col2).

I suppose one way would be to use a script terms aggregation to
concatenate each value of col1 and col2, however thats going to be slow.

Does anyone else have any ideas?

Ideally there would be a tuple aggregation built in, e.g.:

"my_agg":{
"tuple":{
"fields":["col1","col2"]
}
}

Would product keys that are objects like:

{
"col1":"value1",
"col2":"value2"
}

Does anyone know if this would be possible to write as a plugin?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/497c90a2-1625-440f-a004-383dd3872703%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Artur_Martins · October 19, 2014, 3:48pm

I heard that it could be done with a fingerprint, but I don't know how to do this. It's in logstash.conf

Have a look:

Fingerprint the 3-tuple of source address, destination address, destination port

if [SourceAddress] and [DestinationAddress] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "logstash"
source => [ "SourceAddress", "DestinationAddress", "DestinationPort" ]
}
}

But what exactly will this do? What next?
Hope you can understand this and help us both

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/005d8152-9ee0-49bb-a8d5-84ccb9634124%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alastair_James · October 19, 2014, 4:06pm

Hmmm. I dont know much about logstash, but I suspect thats concatenating
the 3 values into one string and taking a hash of it.... This would allow
you to group by that exact set of 3 columns.... however my use case is that
I need to be able to group by and subset of columns, so this could not be
pre-defined in that way.

Al

On 19 October 2014 16:48, Artur Martins arturphk@gmail.com wrote:

I heard that it could be done with a fingerprint, but I don't know how to
do this. It's in logstash.conf

Have a look:

Fingerprint the 3-tuple of source address, destination address,
destination port

if [SourceAddress] and [DestinationAddress] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "logstash"
source => [ "SourceAddress", "DestinationAddress", "DestinationPort" ]
}
}

But what exactly will this do? What next?
Hope you can understand this and help us both

Thanks

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gVLNqArGvVA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/005d8152-9ee0-49bb-a8d5-84ccb9634124%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Dr Alastair James
CTO Ometria.com
Skype: al.james

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMuyCY-u%2B6A%3DRUB1420BQZLt440eAShhSMeiwPWLLJgtq-Bm%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Artur_Martins · October 19, 2014, 9:01pm

Yup, that's true. It will be able to query only by that set of columns,
which is an issue for future requirements.
For now its a quick-fix, but I wonder if I'm missing something on the
"aggregations" function..

Cheers

On Sunday, October 19, 2014 5:07:31 PM UTC+1, Alastair James wrote:

Hmmm. I dont know much about logstash, but I suspect thats concatenating
the 3 values into one string and taking a hash of it.... This would allow
you to group by that exact set of 3 columns.... however my use case is that
I need to be able to group by and subset of columns, so this could not be
pre-defined in that way.

Al

On 19 October 2014 16:48, Artur Martins <artu...@gmail.com <javascript:>>
wrote:

I heard that it could be done with a fingerprint, but I don't know how to
do this. It's in logstash.conf

Have a look:

Fingerprint the 3-tuple of source address, destination address,
destination port

if [SourceAddress] and [DestinationAddress] {
fingerprint {
concatenate_sources => true
method => "SHA1"
key => "logstash"
source => [ "SourceAddress", "DestinationAddress", "DestinationPort" ]
}
}

But what exactly will this do? What next?
Hope you can understand this and help us both

Thanks

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gVLNqArGvVA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/005d8152-9ee0-49bb-a8d5-84ccb9634124%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
Dr Alastair James
CTO Ometria.com
Skype: al.james

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f23f37e7-35a3-4a8a-9c8b-9334460f7aa7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Can elasticsearch do GROUP BY multi fields and ORDER BY count? Elasticsearch	3	26471	May 9, 2019
Sub aggregations Elasticsearch	3	418	June 1, 2017
How to implement SQL 'group by' on multiple fields using aggregations? Elasticsearch	3	14915	July 6, 2017
Help: Flattened aggregations (with limiting and sorting) Elasticsearch	4	1552	July 6, 2017
Best pratices to handle multiple value field separated in terms stats? Elasticsearch	1	288	July 6, 2017

Multi Field Aggregation

Related topics