Elasticsearch Facets + limit results

Martijn_Dwars · May 22, 2014, 8:47am

I'm trying to construct the following SQL query in Elasticsearch:

SELECT companyId, COUNT(*) c FROM visits GROUP BY companyId ORDER BY c DESC
LIMIT 2

I came up with the following JSON body for the query:

{
"facets": {
"company": {
"filter": {
"term": {
"entityType": "companypage"
}
},
"terms": {
"field": "entityId",
"size": 2
}
}
}
}

When I use "size": 2, I get the following result:

facets: {
company: {
_type: terms
missing: 0
total: 4
other: 0
terms: [{
term: 2
count: 3
},
{
term: 20
count: 1
}]
}
}

When I use "size": 1, I get the following result:

facets: {
company: {
_type: terms
missing: 0
total: 4
other: 2
terms: [{
term: 2
count: 2
}]
}
}

How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45ccf14f-81f2-4a9e-b598-6eb120f46197%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

emeschitc · May 22, 2014, 1:44pm

from the documentation:
The size parameter defines how many top terms should be returned out of the
overall terms list. By default, the node coordinating the search process
will ask each shard to provide its own top size terms and once all shards
respond, it will reduce the results to the final list that will then be
sent back to the client. This means that if the number of unique terms is
greater than size, the returned list is slightly off and not accurate (it
could be that the term counts are slightly off and it could even be that a
term that should have been in the top size entries was not returned).

On Thu, May 22, 2014 at 10:47 AM, Martijn Dwars [via Elasticsearch Users] <
ml-node+s115913n4056258h90@n3.nabble.com> wrote:

I'm trying to construct the following SQL query in Elasticsearch:

SELECT companyId, COUNT(*) c FROM visits GROUP BY companyId ORDER BY c
DESC LIMIT 2

I came up with the following JSON body for the query:

{
"facets": {
"company": {
"filter": {
"term": {
"entityType": "companypage"
}
},
"terms": {
"field": "entityId",
"size": 2
}
}
}
}

When I use "size": 2, I get the following result:

facets: {
company: {
_type: terms
missing: 0
total: 4
other: 0
terms: [{
term: 2
count: 3
},
{
term: 20
count: 1
}]
}
}

When I use "size": 1, I get the following result:

facets: {
company: {
_type: terms
missing: 0
total: 4
other: 2
terms: [{
term: 2
count: 2
}]
}
}

How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4056258&i=0
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/45ccf14f-81f2-4a9e-b598-6eb120f46197%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/45ccf14f-81f2-4a9e-b598-6eb120f46197%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-Facets-limit-results-tp4056258.html
To unsubscribe from Elasticsearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=115913&code=ZW1lc2NoaXRjQGdtYWlsLmNvbXwxMTU5MTN8LTExODcwOTk0NDI=
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

emeschitc · May 22, 2014, 1:55pm

How is it possible that the count for term 2 is 3 in the first response,

but 2 in the second response?

From the docs:

The size parameter defines how many top terms should be returned out of the
overall terms list. By default, the node coordinating the search process
will ask each shard to provide its own top size terms and once all shards
respond, it will reduce the results to the final list that will then be
sent back to the client. This means that if the number of unique terms is
greater than size, the returned list is slightly off and not accurate (it
could be that the term counts are slightly off and it could even be that a
term that should have been in the top size entries was not returned).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a37d6e2-6a8b-47dd-9180-8ff6f720a41e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martijn_Dwars · May 24, 2014, 1:21pm

Is there any way to accomplish the same goal and get an accurate result?

Op donderdag 22 mei 2014 15:55:08 UTC+2 schreef emes...@gmail.com:

How is it possible that the count for term 2 is 3 in the first response,

but 2 in the second response?

From the docs:

The size parameter defines how many top terms should be returned out of
the overall terms list. By default, the node coordinating the search
process will ask each shard to provide its own top size terms and once
all shards respond, it will reduce the results to the final list that will
then be sent back to the client. This means that if the number of unique
terms is greater than size, the returned list is slightly off and not
accurate (it could be that the term counts are slightly off and it could
even be that a term that should have been in the top size entries was not
returned).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/196ee17f-7355-4fea-a74b-81f49ca9805f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · May 24, 2014, 5:12pm

One way to have improve accuracy would be to increase shard_size[1]. In
particular if shard_size is greater than the number of unique vaues of your
entityId field, results will be accurate. Please however beware that this
can be resource intensive.

Another option would be to route your indexing requests so that all
documents having the same entityId will end up on the same shard.

[1]

On Sat, May 24, 2014 at 3:21 PM, Martijn ikben@martijndwars.nl wrote:

Is there any way to accomplish the same goal and get an accurate result?

Op donderdag 22 mei 2014 15:55:08 UTC+2 schreef emes...@gmail.com:

How is it possible that the count for term 2 is 3 in the first response,

but 2 in the second response?

From the docs:

The size parameter defines how many top terms should be returned out of
the overall terms list. By default, the node coordinating the search
process will ask each shard to provide its own top size terms and once
all shards respond, it will reduce the results to the final list that will
then be sent back to the client. This means that if the number of unique
terms is greater than size, the returned list is slightly off and not
accurate (it could be that the term counts are slightly off and it could
even be that a term that should have been in the top size entries was
not returned).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/196ee17f-7355-4fea-a74b-81f49ca9805f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/196ee17f-7355-4fea-a74b-81f49ca9805f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j66iRhMX0wKi_66LSy8e%3D%3DAAoU%2BjzJgRzOekBqrSmyBLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Facet within the specified size Elasticsearch	2	481	July 6, 2017
Facets and "Other" Elasticsearch	2	269	July 6, 2017
Terms facets top 10 stat,size:500 not valid Elasticsearch	1	317	July 6, 2017
Limit terms facet results by minimum count Elasticsearch	3	478	July 6, 2017
Loss of count accuracy for term facets Elasticsearch	1	277	July 6, 2017

Elasticsearch Facets + limit results

Related topics