My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of top
tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id
field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query
Aggregation [tags] cannot support the include/exclude settings as it can
only be applied to string values.
Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution, because
the bucket corresponding to the query is not guaranteed to be the first one
of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.
On Thursday, August 7, 2014 10:54:43 AM UTC-5, Michele Palmia wrote:
Hi all!
My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of
top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id
field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query
Aggregation [tags] cannot support the include/exclude settings as it can
only be applied to string values.
Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution, because
the bucket corresponding to the query is not guaranteed to be the first one
of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.
I added a comment to an issue opened a while ago about the exclude feature
of term aggregations, on GitHub: I think this is something that should be
fixed.
On Fri, Aug 15, 2014 at 8:31 PM, Luke Nezda lnezda@gmail.com wrote:
On Thursday, August 7, 2014 10:54:43 AM UTC-5, Michele Palmia wrote:
Hi all!
My documents contain an integer array field storing the id of tags
describing them. Given a specific tag id, I want to extract a list of
top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag
id field to a term filter over the same field, but the list I get back
obviously always starts with the album id I provide: all documents matching
my filter have that tag, and it is thus the first in the list.I though of using
the exclude field http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values
to avoid creating the problematic bucket, but as I'm dealing with an
integer field, that seems not to be possible: this query
Aggregation [tags] cannot support the include/exclude settings as it can
only be applied to string values.
Is it possible to avoid getting back this bucket in some way?
Unfortunately, I can only use ES 1.2 (AWS plugin not yet ready for 1.3).
I'm mostly afraid dealing with this problem after query execution,
because the bucket corresponding to the query is not guaranteed to be the
first one of the list, for example in case there are only a little matching
documents, all having exactly the same two tags.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.