Aggreagations and distinct


(nicolas maillard) #1

Hello everyone

I'm playing around with the aggregations to get a better feel of what I can
don with them.
I was wondering how I would write the following aggregations/
I have entries listing user interactions and the ip at the time of
interaction.
Say I want to count for every user the number of different ips i have seen
for them.
Other question
I want to find the most seen ips for every user.

My initial attempt was:

{
"aggs" : {
"genders" : {
"terms" : {
"field" : "user_id"
},
"aggs" : {
"ips" : { "terms" : { "field" : "remoteip" } }
}
}
}
}

but that does not seem to be quite right

regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e56fdb8d-9173-45cc-8abf-884d20018727%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(nicolas maillard) #2

If this helps anyone I have found the query:
On a side it is relatively slow:
my dataset is about 2,5 million docs on a single node with 15gb ram;
the query:

{
"aggs": {
"user": {
"terms": {
"field": "user_id"
},
"aggs": {
"popular_ips": {
"terms": {
"field": "remoteip"
}
}
}
}
}
}

On Tuesday, December 3, 2013 6:18:02 AM UTC+1, nicolas maillard wrote:

Hello everyone

I'm playing around with the aggregations to get a better feel of what I
can don with them.
I was wondering how I would write the following aggregations/
I have entries listing user interactions and the ip at the time of
interaction.
Say I want to count for every user the number of different ips i have seen
for them.
Other question
I want to find the most seen ips for every user.

My initial attempt was:

{
"aggs" : {
"genders" : {
"terms" : {
"field" : "user_id"
},
"aggs" : {
"ips" : { "terms" : { "field" : "remoteip" } }
}
}
}
}

but that does not seem to be quite right

regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/170c72ef-b3ef-44d4-8786-cbc8b1588b2e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #3

Hi Nicolas,

The aggregations framework present in Elasticsearch 1.0 beta 2 is still at
an early stage and doesn't have all the optimizations that facets have got
over their years of existence. For example, if you compare terms facets
against terms aggregations on string terms, you may notice that terms
aggregations are significantly slower. The reason is that aggregations
don't know yet how to leverage terms ordinals in order to speed up the
generation of the buckets: this is something that will be addressed in the
1.0 release. There are other similar improvements that are planned for the
next weeks and performance numbers should hopefully get better in the next
releases.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6Ykhp_YrfRqO-71qm4r0wQX2aqkufnpv1WnMrOSCtnMw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(nicolas maillard) #4

Thanks for the heads up adrien

definitly looking forward to this release. I'm testing out the usability on
some of our use cases and right now it is a little slow and very ogten
hitting the Ram limit even for this small table and a somwhat simple query.
none the less great feture and I am sure will event better by the time it
hits GA.
Thanks es for all the hard work

On Tuesday, December 3, 2013 10:34:05 AM UTC+1, Adrien Grand wrote:

Hi Nicolas,

The aggregations framework present in Elasticsearch 1.0 beta 2 is still at
an early stage and doesn't have all the optimizations that facets have got
over their years of existence. For example, if you compare terms facets
against terms aggregations on string terms, you may notice that terms
aggregations are significantly slower. The reason is that aggregations
don't know yet how to leverage terms ordinals in order to speed up the
generation of the buckets: this is something that will be addressed in the
1.0 release. There are other similar improvements that are planned for the
next weeks and performance numbers should hopefully get better in the next
releases.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bee0dca4-9dc1-438b-9f08-b9890073d661%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5