Hi there.
I am trying to create an aggregation that mimics the following SQL query:
SELECT col1, col2, COUNT(*), SUM(metric) FROM table GROUP BY col1, col2
ORDER BY SUM(metric) DESC
On the face of it, I could create an terms aggregation for col1, add a
terms aggregation for col2 inside it, and the metric aggregations inside
that. I could then dynamically build the SQL result like grid and sort it
myself. However this breaks down for large results set, or a paginated
result set of a larger result.
The problem is that the ES aggregation system always returns the top N
results for each parent and child bucket. Thus for each value of col1 I
have N values of col2.
What I really want is to consider all possible combinations of col1 and
col2 in the same way as SQL does it and return the top N based on some
other metric. E.g. in ES speak, a single aggregation where the keys are
tuples of (col1, col2).
I suppose one way would be to use a script terms aggregation to concatenate
each value of col1 and col2, however thats going to be slow.
Does anyone else have any ideas?
Ideally there would be a tuple aggregation built in, e.g.:
"my_agg":{
"tuple":{
"fields":["col1","col2"]
}
}
Would product keys that are objects like:
{
"col1":"value1",
"col2":"value2"
}
Does anyone know if this would be possible to write as a plugin?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ec80e84-17b0-435d-a02e-f56a5d49f733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.