Filter output of term facet - not input

I've read the elasticsearch api docs and done a lot of googling but still
cannot find a solution. I need to only output filtered items from the
faceted term search. So posting here.

Here is some sample data:
{
id: 1e27202c54a0a600e06257c0ae341e8e
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:34.000Z
epoch: 1360336714000
tags: [
userid_5114575ae4b0cb71b6654320,
username_testuser1,
microsoft
]
geo_latitude: 51.900002
geo_longitude: 8.3833
geo_city: Gütersloh
geo_country_code: DE
geo_country: Germany
geo_region_code: 07
geo_region: Nordrhein-Westfalen
}
{
id: 1e27202c2e7aac00e062d233dde576aa
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:28.000Z
epoch: 1360336708000
tags: [
userid_5114575ae4b0cb71b6654321,
username_testuser2,
kinect
]
geo_latitude: 23.051201
geo_longitude: 112.459702
geo_city: Zhaoqing
geo_country_code: CN
geo_country: China
geo_region_code: 30
geo_region: Guangdong
}

I want to find the number of occurrences for a bunch of user-ids for a
range of times. I came up with a filtered and faceted query like so:

{
"query": {
"range": {
"created_at": { "from": "now-10d", "to": "now"}
}
},
"from": 0,
"size": 0,
"facets": {
"tag_facet": {
"terms": {"field": "tags"},
"facet_filter": {
"or": [
{ "term": { "tags": "userid_5114575ae4b0cb71b6654321" } },
{ "term": { "tags": "userid_5114575ae4b0cb71b6654320" } }
]
}
}
}
}

The result I get is :

facets: {
tag_facet: {
_type: terms
missing: 0
total: 1947503
other: 305
terms: [
{term: username_testuser1,count: 539453}
{term: userid_5114575ae4b0cb71b6654320,count: 539453}
{term: iphone,count: 245888}
{term: microsoft,count: 193543}
{term: userid_50f06636e4b0560131c8730c,count: 107155}
{term: kinect,count: 101051}
]
}

The result I get also includes counts for other tags like username_testuser1,
microsoft, kinect etc. I dont want those results, only the counts for x
number of user-ids using [or] filters, where I will limit x to not more
than 10.

Any guidance on how to solve this? There could be 1000s of results and I
dont want to iterate through them in the app layer to find the two items
that are needed.

Thanks!
-Ripple

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The facet filters will filter out which results to facet on, not filter out
the facets that are returned.

In your example, your first document has
tags: [
userid_5114575ae4b0cb71b6654320,
username_testuser1,
microsoft
]

Since this document passed the filter, all those values will be used for
the facet. You would need to iterate through them in the app layer.

--
Ivan

On Tue, Feb 19, 2013 at 10:39 AM, ripplekhera ripplekhera@gmail.com wrote:

I've read the elasticsearch api docs and done a lot of googling but still
cannot find a solution. I need to only output filtered items from the
faceted term search. So posting here.

Here is some sample data:
{
id: 1e27202c54a0a600e06257c0ae341e8e
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:34.000Z
epoch: 1360336714000
tags: [
userid_5114575ae4b0cb71b6654320,
username_testuser1,
microsoft
]
geo_latitude: 51.900002
geo_longitude: 8.3833
geo_city: Gütersloh
geo_country_code: DE
geo_country: Germany
geo_region_code: 07
geo_region: Nordrhein-Westfalen
}
{
id: 1e27202c2e7aac00e062d233dde576aa
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:28.000Z
epoch: 1360336708000
tags: [
userid_5114575ae4b0cb71b6654321,
username_testuser2,
kinect
]
geo_latitude: 23.051201
geo_longitude: 112.459702
geo_city: Zhaoqing
geo_country_code: CN
geo_country: China
geo_region_code: 30
geo_region: Guangdong
}

I want to find the number of occurrences for a bunch of user-ids for a
range of times. I came up with a filtered and faceted query like so:

{
"query": {
"range": {
"created_at": { "from": "now-10d", "to": "now"}
}
},
"from": 0,
"size": 0,
"facets": {
"tag_facet": {
"terms": {"field": "tags"},
"facet_filter": {
"or": [
{ "term": { "tags": "userid_5114575ae4b0cb71b6654321" } },
{ "term": { "tags": "userid_5114575ae4b0cb71b6654320" } }
]
}
}
}
}

The result I get is :

facets: {
tag_facet: {
_type: terms
missing: 0
total: 1947503
other: 305
terms: [
{term: username_testuser1,count: 539453}
{term: userid_5114575ae4b0cb71b6654320,count: 539453}
{term: iphone,count: 245888}
{term: microsoft,count: 193543}
{term: userid_50f06636e4b0560131c8730c,count: 107155}
{term: kinect,count: 101051}
]
}

The result I get also includes counts for other tags like username_testuser1,
microsoft, kinect etc. I dont want those results, only the counts for x
number of user-ids using [or] filters, where I will limit x to not more
than 10.

Any guidance on how to solve this? There could be 1000s of results and I
dont want to iterate through them in the app layer to find the two items
that are needed.

Thanks!
-Ripple

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Oh well. I had a feeling it couldn't be done. Thanks for verifying my
thoughts.

On Tuesday, February 19, 2013 10:56:30 AM UTC-8, Ivan Brusic wrote:

The facet filters will filter out which results to facet on, not filter
out the facets that are returned.

In your example, your first document has
tags: [
userid_5114575ae4b0cb71b6654320,
username_testuser1,
microsoft
]

Since this document passed the filter, all those values will be used for
the facet. You would need to iterate through them in the app layer.

--
Ivan

On Tue, Feb 19, 2013 at 10:39 AM, ripplekhera <rippl...@gmail.com<javascript:>

wrote:

I've read the elasticsearch api docs and done a lot of googling but still
cannot find a solution. I need to only output filtered items from the
faceted term search. So posting here.

Here is some sample data:
{
id: 1e27202c54a0a600e06257c0ae341e8e
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:34.000Z
epoch: 1360336714000
tags: [
userid_5114575ae4b0cb71b6654320,
username_testuser1,
microsoft
]
geo_latitude: 51.900002
geo_longitude: 8.3833
geo_city: Gütersloh
geo_country_code: DE
geo_country: Germany
geo_region_code: 07
geo_region: Nordrhein-Westfalen
}
{
id: 1e27202c2e7aac00e062d233dde576aa
interaction_type: bitly
url:
created_at: 2013-02-08T15:18:28.000Z
epoch: 1360336708000
tags: [
userid_5114575ae4b0cb71b6654321,
username_testuser2,
kinect
]
geo_latitude: 23.051201
geo_longitude: 112.459702
geo_city: Zhaoqing
geo_country_code: CN
geo_country: China
geo_region_code: 30
geo_region: Guangdong
}

I want to find the number of occurrences for a bunch of user-ids for a
range of times. I came up with a filtered and faceted query like so:

{
"query": {
"range": {
"created_at": { "from": "now-10d", "to": "now"}
}
},
"from": 0,
"size": 0,
"facets": {
"tag_facet": {
"terms": {"field": "tags"},
"facet_filter": {
"or": [
{ "term": { "tags": "userid_5114575ae4b0cb71b6654321" } },
{ "term": { "tags": "userid_5114575ae4b0cb71b6654320" } }
]
}
}
}
}

The result I get is :

facets: {
tag_facet: {
_type: terms
missing: 0
total: 1947503
other: 305
terms: [
{term: username_testuser1,count: 539453}
{term: userid_5114575ae4b0cb71b6654320,count: 539453}
{term: iphone,count: 245888}
{term: microsoft,count: 193543}
{term: userid_50f06636e4b0560131c8730c,count: 107155}
{term: kinect,count: 101051}
]
}

The result I get also includes counts for other tags like username_testuser1,
microsoft, kinect etc. I dont want those results, only the counts for x
number of user-ids using [or] filters, where I will limit x to not more
than 10.

Any guidance on how to solve this? There could be 1000s of results and I
dont want to iterate through them in the app layer to find the two items
that are needed.

Thanks!
-Ripple

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Maybe I am missing something but why not add filter on user to your query then you will limit your resultset and facets will only have the user you want

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Actually you have two options:

  1. Use you can exclude tags you don't want counted (ie. microsoft,
    kinect, etc)
  2. Use a regex pattern for the terms you want included.

#2 would be my choice because it looks like you can do a basic
expression such as "username_.*$" or even looking for your specific
users "userid_5114575ae4b0cb71b6654321|userid_5114575ae4b0cb71b6654320"

See the section on excluding terms and regex patterns here:

Hope this helps.

Thanks,
Matt Weber

On Tue, Feb 19, 2013 at 1:43 PM, AlexR roytmana@gmail.com wrote:

Maybe I am missing something but why not add filter on user to your query then you will limit your resultset and facets will only have the user you want

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matt, thank you very much for the regex suggestion. It is awesome. This is
a sample of my resulted query. I think providing both a query filter and
facet filter might be overkill so I might remove it, but it works:

{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : {
"range" : {"created_at" : {"from" : "now-30d","to" : "now",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
},
"fields" : [ "id", "tags" ],
"facets" : {
"usageFacet" : {
"terms" : {
"field" : "tags",
"size" : 9,
"regex" : "userid_.*$"
},
"facet_filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
}
}

Another option could have been to use the query facet. But this one serves
better.
The query facet is available at :

On Tuesday, February 19, 2013 2:09:25 PM UTC-8, Matt Weber wrote:

Actually you have two options:

  1. Use you can exclude tags you don't want counted (ie. microsoft,
    kinect, etc)
  2. Use a regex pattern for the terms you want included.

#2 would be my choice because it looks like you can do a basic
expression such as "username_.*$" or even looking for your specific
users "userid_5114575ae4b0cb71b6654321|userid_5114575ae4b0cb71b6654320"

See the section on excluding terms and regex patterns here:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps.

Thanks,
Matt Weber

On Tue, Feb 19, 2013 at 1:43 PM, AlexR <royt...@gmail.com <javascript:>>
wrote:

Maybe I am missing something but why not add filter on user to your
query then you will limit your resultset and facets will only have the user
you want

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yea, lose the facet filter. Move it up as another "must" clause to
your boolean query filter and don't use an "or" filter, use a
TermsFilter as that will give you better performance.

On Thu, Feb 21, 2013 at 2:31 PM, ripplekhera ripplekhera@gmail.com wrote:

Matt, thank you very much for the regex suggestion. It is awesome. This is a
sample of my resulted query. I think providing both a query filter and facet
filter might be overkill so I might remove it, but it works:

{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : {
"range" : {"created_at" : {"from" : "now-30d","to" : "now",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
},
"fields" : [ "id", "tags" ],
"facets" : {
"usageFacet" : {
"terms" : {
"field" : "tags",
"size" : 9,
"regex" : "userid_.*$"
},
"facet_filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
}
}

Another option could have been to use the query facet. But this one serves
better.
The query facet is available at :
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tuesday, February 19, 2013 2:09:25 PM UTC-8, Matt Weber wrote:

Actually you have two options:

  1. Use you can exclude tags you don't want counted (ie. microsoft,
    kinect, etc)
  2. Use a regex pattern for the terms you want included.

#2 would be my choice because it looks like you can do a basic
expression such as "username_.*$" or even looking for your specific
users "userid_5114575ae4b0cb71b6654321|userid_5114575ae4b0cb71b6654320"

See the section on excluding terms and regex patterns here:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps.

Thanks,
Matt Weber

On Tue, Feb 19, 2013 at 1:43 PM, AlexR royt...@gmail.com wrote:

Maybe I am missing something but why not add filter on user to your
query then you will limit your resultset and facets will only have the user
you want

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Once again more thanks. Refined it further:

{
"size" : 0,
"query" : {
"bool" : {
"must" : [ {
"range" : {
"created_at" : {
"from" : "now-30d",
"to" : "now",
"include_lower" : true,
"include_upper" : true
}
}
}, {
"terms" : {
"tags" : [ "userid_511a8b7ae4b041a03f7fb05a",
"userid_511bd133e4b051c4e5f5a6a9", "userid_511bd195e4b051c4e5f5a6ab",
"userid_511bd223e4b051c4e5f5a6ae", "userid_5123f4a4e4b0f7e78834fd1a",
"userid_51149015e4b02131feaf81bd", "userid_511d8916e4b075a52833d23e",
"userid_511e6e3fe4b075a52833d243", "userid_511d89a8e4b075a52833d23f",
"userid_5112ea0e4f7ecb3372000003" ]
}
} ]
}
},
"fields" : [ "id", "tags" ],
"facets" : {
"usage_facet" : {
"terms" : {
"field" : "tags",
"size" : 10,
"regex" : "userid_.*$"
}
}
}
}

On Thursday, February 21, 2013 2:49:08 PM UTC-8, Matt Weber wrote:

Yea, lose the facet filter. Move it up as another "must" clause to
your boolean query filter and don't use an "or" filter, use a
TermsFilter as that will give you better performance.
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Thu, Feb 21, 2013 at 2:31 PM, ripplekhera <rippl...@gmail.com<javascript:>>
wrote:

Matt, thank you very much for the regex suggestion. It is awesome. This
is a
sample of my resulted query. I think providing both a query filter and
facet
filter might be overkill so I might remove it, but it works:

{
"size" : 0,
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : {
"range" : {"created_at" : {"from" : "now-30d","to" : "now",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
},
"fields" : [ "id", "tags" ],
"facets" : {
"usageFacet" : {
"terms" : {
"field" : "tags",
"size" : 9,
"regex" : "userid_.*$"
},
"facet_filter" : {
"or" : {
"filters" : [ {"term" : {"tags" :
"userid_50fd9f5373e13056e76f9e7f"}},
{"term" : {"tags" : "userid_51007ef3e4b0a99e8714a9e9"}},
{"term" : {"tags" : "userid_50e228cae4b05800e6ea1ef2"}},
{"term" : {"tags" : "userid_50fdaee7e4b05ced2d76710e"}},
{"term" : {"tags" : "userid_50c6dfdee4b030a63367a6e7"}},
{"term" : {"tags" : "userid_50f4680ce4b080e3535795dc"}},
{"term" : {"tags" : "userid_50ddf00ae4b000084a2dc057"}},
{"term" : { "tags" : "userid_50c6e080e4b030a63367a6e9" }},
{"term" : {"tags" : "userid_50f06636e4b0560131c8730c" }} ]
}
}
}
}
}

Another option could have been to use the query facet. But this one
serves
better.
The query facet is available at :

Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tuesday, February 19, 2013 2:09:25 PM UTC-8, Matt Weber wrote:

Actually you have two options:

  1. Use you can exclude tags you don't want counted (ie. microsoft,
    kinect, etc)
  2. Use a regex pattern for the terms you want included.

#2 would be my choice because it looks like you can do a basic
expression such as "username_.*$" or even looking for your specific
users "userid_5114575ae4b0cb71b6654321|userid_5114575ae4b0cb71b6654320"

See the section on excluding terms and regex patterns here:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps.

Thanks,
Matt Weber

On Tue, Feb 19, 2013 at 1:43 PM, AlexR royt...@gmail.com wrote:

Maybe I am missing something but why not add filter on user to your
query then you will limit your resultset and facets will only have
the user
you want

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.