Facets and statistics with elasticsearch (on a subset)

Hi,

I have a problem with facets in elasticsearch. I have a table videos, a
table channels, 1 channel has many videos. I just want to show a donut with
% of views_count per channel on the X lastest videos.

In SQL:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

I can do that on all videos but I don't arrive to do that with the LIMIT.

Any Idea??

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the
hardest part. Assuming you'd have the video documents which have a
channel_id field and a views_count field, you can do a filtered 0
query with a term 1 filter on channel_id, and a terms facet 2
over views_count, and that will return a count (10, by default) of most
viewed videos. (In other scenarios, you could also use a facet_filter to
restrict facets to a certain channel_id, eg. when you'd like to get back
facets for multiple channels.)

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well quite simple (you're assuming well) :

videos:
channel_id
views_count
published_at

When I do that query, the total / sum, etc... from statistical is on all
the data set. I would like have the sum of total views for the last 100
videos not the all dataset... Do you know what I mean?
I try size but it doesn't work...

On Sunday, March 17, 2013 6:07:15 PM UTC+1, Karel Minařík wrote:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the
hardest part. Assuming you'd have the video documents which have a
channel_id field and a views_count field, you can do a filtered 0
query with a term 1 filter on channel_id, and a terms facet 2
over views_count, and that will return a count (10, by default) of most
viewed videos. (In other scenarios, you could also use a facet_filter to
restrict facets to a certain channel_id, eg. when you'd like to get back
facets for multiple channels.)

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ah, OK, then just add a range filter on published_at to the query (to limit the facet computation), and use size to limit the number of results in the facet.

Karel

On Sunday, 17. March 2013 at 18:10, Fabien Guiraud wrote:

Well quite simple (you're assuming well) :

videos:
channel_id
views_count
published_at

When I do that query, the total / sum, etc... from statistical is on all the data set. I would like have the sum of total views for the last 100 videos not the all dataset... Do you know what I mean?
I try size but it doesn't work...

On Sunday, March 17, 2013 6:07:15 PM UTC+1, Karel Minařík wrote:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the hardest part. Assuming you'd have the video documents which have a channel_id field and a views_count field, you can do a filtered 0 query with a term 1 filter on channel_id, and a terms facet 2 over views_count, and that will return a count (10, by default) of most viewed videos. (In other scenarios, you could also use a facet_filter to restrict facets to a certain channel_id, eg. when you'd like to get back facets for multiple channels.)

Karel

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/-lQsU4k0sNU/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I put a sort DESC on published_at, but size works because it returns only
10 results for example, but the facet calculate on the entire dataset... :frowning:
Look :
curl -X GET
'http://localhost:9200/videos/video/_search?from=0&size=20&pretty' -d
'{"query":{"bool":{"must":[{"query_string":{"query":"*","default_operator":"AND"}}]}},"sort":[{"published_at":"desc"}],"facets":{"views_count":{"statistical":{"field":"views_count"}}},"size":20,"from":0}'

Do you see any errors?

Thanks :slight_smile:

On Sunday, March 17, 2013 6:38:24 PM UTC+1, Karel Minařík wrote:

Ah, OK, then just add a range filter on published_at to the query (to
limit the facet computation), and use size to limit the number of results
in the facet.

Karel

On Sunday, 17. March 2013 at 18:10, Fabien Guiraud wrote:

Well quite simple (you're assuming well) :

videos:
channel_id
views_count
published_at

When I do that query, the total / sum, etc... from statistical is on all
the data set. I would like have the sum of total views for the last 100
videos not the all dataset... Do you know what I mean?
I try size but it doesn't work...

On Sunday, March 17, 2013 6:07:15 PM UTC+1, Karel Minařík wrote:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the
hardest part. Assuming you'd have the video documents which have a
channel_id field and a views_count field, you can do a filtered 0
query with a term 1 filter on channel_id, and a terms facet 2
over views_count, and that will return a count (10, by default) of most
viewed videos. (In other scenarios, you could also use a facet_filter to
restrict facets to a certain channel_id, eg. when you'd like to get back
facets for multiple channels.)

Karel

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-lQsU4k0sNU/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The sort is completely useles here -- it only applies to results returned,
not facets.

In your example you're using a statistical facet, which IMHO does
something different then you asked for in the original question. For the
terms facet, you control the number of returned aggregations by the size
parameter, see
docs: Elasticsearch Platform — Find real-time answers at scale | Elastic

Your bool query with string is also very different from what you've
asked in the original question and what I suggested as an Elasticsearch way
of expressing WHERE videos.channel_id = X. The query part in
Elasticsearch request restricts both search results and facets. If you only
care about facets, you should use search_type=count.

Karel

On Sunday, March 17, 2013 6:48:03 PM UTC+1, Fabien Guiraud wrote:

I put a sort DESC on published_at, but size works because it returns only
10 results for example, but the facet calculate on the entire dataset... :frowning:
Look :
curl -X GET '
http://localhost:9200/videos/video/_search?from=0&size=20&pretty' -d
'{"query":{"bool":{"must":[{"query_string":{"query":"*","default_operator":"AND"}}]}},"sort":[{"published_at":"desc"}],"facets":{"views_count":{"statistical":{"field":"views_count"}}},"size":20,"from":0}'

Do you see any errors?

Thanks :slight_smile:

On Sunday, March 17, 2013 6:38:24 PM UTC+1, Karel Minařík wrote:

Ah, OK, then just add a range filter on published_at to the query (to
limit the facet computation), and use size to limit the number of results
in the facet.

Karel

On Sunday, 17. March 2013 at 18:10, Fabien Guiraud wrote:

Well quite simple (you're assuming well) :

videos:
channel_id
views_count
published_at

When I do that query, the total / sum, etc... from statistical is on all
the data set. I would like have the sum of total views for the last 100
videos not the all dataset... Do you know what I mean?
I try size but it doesn't work...

On Sunday, March 17, 2013 6:07:15 PM UTC+1, Karel Minařík wrote:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the
hardest part. Assuming you'd have the video documents which have a
channel_id field and a views_count field, you can do a filtered 0
query with a term 1 filter on channel_id, and a terms facet 2
over views_count, and that will return a count (10, by default) of most
viewed videos. (In other scenarios, you could also use a facet_filter to
restrict facets to a certain channel_id, eg. when you'd like to get back
facets for multiple channels.)

Karel

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-lQsU4k0sNU/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Okay nice, I understand a little more, I will try that, thanks !

On Mon, Mar 18, 2013 at 11:34 AM, Karel Minařík karel.minarik@gmail.comwrote:

The sort is completely useles here -- it only applies to results returned,
not facets.

In your example you're using a statistical facet, which IMHO does
something different then you asked for in the original question. For the
terms facet, you control the number of returned aggregations by the size
parameter, see docs:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Your bool query with string is also very different from what you've
asked in the original question and what I suggested as an Elasticsearch way
of expressing WHERE videos.channel_id = X. The query part in
Elasticsearch request restricts both search results and facets. If you only
care about facets, you should use search_type=count.

Karel

On Sunday, March 17, 2013 6:48:03 PM UTC+1, Fabien Guiraud wrote:

I put a sort DESC on published_at, but size works because it returns only
10 results for example, but the facet calculate on the entire dataset... :frowning:
Look :
curl -X GET 'http://localhost:9200/videos/**video/_search?from=0&size=20&
**prettyhttp://localhost:9200/videos/video/_search?from=0&size=20&pretty'
-d '{"query":{"bool":{"must":[{"query_string":{"query":"*","
default_operator":"AND"}}]}},"sort":[{"published_at":"desc"}
],"facets":{"views_count":{"statistical":{"field":"views_
count"}}},"size":20,"from":0}'

Do you see any errors?

Thanks :slight_smile:

On Sunday, March 17, 2013 6:38:24 PM UTC+1, Karel Minařík wrote:

Ah, OK, then just add a range filter on published_at to the query (to
limit the facet computation), and use size to limit the number of results
in the facet.

Karel

On Sunday, 17. March 2013 at 18:10, Fabien Guiraud wrote:

Well quite simple (you're assuming well) :

videos:
channel_id
views_count
published_at

When I do that query, the total / sum, etc... from statistical is on all
the data set. I would like have the sum of total views for the last 100
videos not the all dataset... Do you know what I mean?
I try size but it doesn't work...

On Sunday, March 17, 2013 6:07:15 PM UTC+1, Karel Minařík wrote:

SELECT SUM(views_count) FROM videos WHERE videos.channel_id = X ORDER BY published_at DESC LIMIT Y

It all depends on how you model your data in Elasticseach, that's the
hardest part. Assuming you'd have the video documents which have a
channel_id field and a views_count field, you can do a filtered 0
query with a term [1] filter on channel_id, and a terms facet [2]
over views_count, and that will return a count (10, by default) of most
viewed videos. (In other scenarios, you could also use a facet_filter to
restrict facets to a certain channel_id, eg. when you'd like to get back
facets for multiple channels.)

Karel

filtered-query.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/filtered-query.html
[1]: http://www.elasticsearch.org/guide/reference/query-dsl/
term-filter.htmlhttp://www.elasticsearch.org/guide/reference/query-dsl/term-filter.html
[2]: http://www.elasticsearch.org/guide/reference/api/
search/facets/terms-facet.htmlhttp://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/-**lQsU4k0sNU/unsubscribe?hl=en-**UShttps://groups.google.com/d/topic/elasticsearch/-lQsU4k0sNU/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-lQsU4k0sNU/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Fabien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.