Exclude specific bucket with integer key from term aggregation


(Michele Palmia) #1

Hi all!

My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of top
tags that occur most frequently together with the provided one
.

I can solve this problem associating a term aggregation over the tag id
field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query

{

"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}

returns an error saying that

Aggregation [tags] cannot support the include/exclude settings as it can

only be applied to string values.

Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution, because
the bucket corresponding to the query is not guaranteed to be the first one
of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.

Thank you in advance!
Michele

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c748f0bc-c2e9-4340-8936-f41345c71d55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Luke Nezda) #2

I have this problem too - this was easily solved using the Terms Facet's
exclude feature
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_excluding_terms,
but I haven't found a solution within Elasticsearch (aggregations) to
this either. Here's a gist demonstrating this:
https://gist.github.com/nezda/60932c73a8485e9d9a49 .

On Thursday, August 7, 2014 10:54:43 AM UTC-5, Michele Palmia wrote:

Hi all!

My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of
top tags that occur most frequently together with the provided one
.

I can solve this problem associating a term aggregation over the tag id
field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query

{

"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}

returns an error saying that

Aggregation [tags] cannot support the include/exclude settings as it can

only be applied to string values.

Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution, because
the bucket corresponding to the query is not guaranteed to be the first one
of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.

Thank you in advance!
Michele

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af56ce57-48a0-4c75-b3c5-d2f9363fd881%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Michele Palmia) #3

I added a comment to an issue opened a while ago about the exclude feature
of term aggregations, on GitHub: I think this is something that should be
fixed.

On Fri, Aug 15, 2014 at 8:31 PM, Luke Nezda lnezda@gmail.com wrote:

I have this problem too - this was easily solved using the Terms Facet's
exclude feature
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_excluding_terms,
but I haven't found a solution within Elasticsearch (aggregations) to
this either. Here's a gist demonstrating this:
https://gist.github.com/nezda/60932c73a8485e9d9a49 .

On Thursday, August 7, 2014 10:54:43 AM UTC-5, Michele Palmia wrote:

Hi all!

My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of
top tags that occur most frequently together with the provided one
.

I can solve this problem associating a term aggregation over the tag
id field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query

{

"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}

returns an error saying that

Aggregation [tags] cannot support the include/exclude settings as it can

only be applied to string values.

Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution,
because the bucket corresponding to the query is not guaranteed to be the
first one of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.

Thank you in advance!
Michele

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/8g74ov0run0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/af56ce57-48a0-4c75-b3c5-d2f9363fd881%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/af56ce57-48a0-4c75-b3c5-d2f9363fd881%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALkm6kdPV3Yk7oyU0_JRSaXweZvOWYpKEvSaB5ayp4o32dSgGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4