Facets on nested objects


(oreno) #1

Hi,
I'm trying to run facet on nested objects but at the moment I'm not getting the results I'm after.
In the below query you can see the main query which is suppose to count the number of users which have at least one nested object(buying event) between 2013-10-01 and 2013-10-10 AND also have a TV as one of the products in that event.
On the facet side I'm looking to get the same result, only have it done by facet - that way I can get the counts for all products between the dates (not just TV), without running the main query for each of them (the way my system works now)

What I'm expecting to get is the same count for the main query ("total": 284070) and for the facet calculation (term": "TV","count": 535445) which is not the case at the moment.

I believe it's giving me the total count of nested objects(events) that match the condition, instead of the total unique users which I'm after.

Does anyone know what I'm doing wrong here?

*When setting the range on a single day, the results are the same as expected.
seems like when running on multiple day range, some users are counted more than once.

Thanks in advanced,

curl-XPOST'http: //XXXXX: 9200/sample/_search?pretty=true'-d'{
"size": 0,
"query": {
"nested": {
"query": {
"bool": {
"must": [{
"term": {
"events.products.product": "TV"
}
},
{
"range": {
"events.event_time": {
"from": "2013-10-01",
"to": "2013-10-10",
"include_lower": true,
"include_upper": true
}
}
}]
}
},
"path": "events"
}
},
"facets": {
"tags": {
"terms": {
"field": "product",
"size": 200
},
"nested": "events",
"facet_filter": {
"range": {
"events.event_time": {
"from": "2013-10-01",
"to": "2013-10-10",
"include_lower": true,
"include_upper": true
}
}
}
}
}
}'{
"took": 96,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 284070,
"max_score": 4.171436,
"hits": []
},
"facets": {
"tags": {
"_type": "terms",
"missing": 0,
"total": 13036875,
"other": 1901080,
"terms": [{
"term": "TV",
"count": 535445
},
{
"term": "DISHWASHER",
"count": 375003
},
{
"term": "RADIO",
"count": 316831
},
.....

mapping:

{
"user": {
"_ttl": {
"enabled": true
},
"properties": {
"name": {
"type": "string"
},
"events": {
"type": "nested",
"properties": {
"event_time": {
"type": "Date"
},
"products": {
"properties": {
"product": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}


(Alexander Reelsen) #2

Hey,

wondering about the facet part a bit (maybe I didnt look good enough),
shouldnt the facet field not be "products.product" instead of "products"
only in order to filter correctly?

Can you create a small gist to reproduce your behaviour easily?

--Alex

On Sun, Dec 29, 2013 at 10:14 AM, oreno oreno@exelate.com wrote:

Hi,
I'm trying to run facet on nested objects but at the moment I'm not getting
the results I'm after.
In the below query you can see the main query which is suppose to count the
number of users which have a nested object(buying event) between 2013-10-01
and 2013-10-10 AND also have a TV as one of the products in that event.
On the facet side I'm looking to get the same result, only have it done by
facet - that way I can get the counts for all products between the dates
(not just TV), without running the main query for each of them (the way my
system works now)

What I'm expecting to get is the same count for the main query ("total":
284070) and for the facet calculation (term": "TV","count": 535445) which
is
not the case at the moment.

Does anyone know what I'm doing wrong here?

Thanks in advanced,

curl-XPOST'http: //XXXXX: 9200/sample/_search?pretty=true'-d'{
"size": 0,
"query": {
"nested": {
"query": {
"bool": {
"must": [{
"term": {

"events.products.product": "TV"
}
},
{
"range": {

"events.event_time": {
"from":
"2013-10-01",
"to":
"2013-10-10",

"include_lower": true,

"include_upper": true
}
}
}]
}
},
"path": "events"
}
},
"facets": {
"tags": {
"terms": {
"field": "product",
"size": 200
},
"nested": "events",
"facet_filter": {
"range": {
"events.event_time": {
"from": "2013-10-01",
"to": "2013-10-10",
"include_lower": true,
"include_upper": true
}
}
}
}
}
}'{
"took": 96,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 284070,
"max_score": 4.171436,
"hits": []
},
"facets": {
"tags": {
"_type": "terms",
"missing": 0,
"total": 13036875,
"other": 1901080,
"terms": [{
"term": "TV",
"count": 535445
},
{
"term": "DISHWASHER",
"count": 375003
},
{
"term": "RADIO",
"count": 316831
},
.....

mapping:

{
"user": {
"_ttl": {
"enabled": true
},
"properties": {
"name": {
"type": "string"
},
"events": {
"type": "nested",
"properties": {
"event_time": {
"type": "Date"
},
"products": {
"properties": {
"product": {
"type":
"string",
"index":
"not_analyzed"
}
}
}
}
}
}
}
}

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/facets-on-nested-objects-tp4046760.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1388308488580-4046760.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9g%3DDOi9ZWG31X6XzbQEi42byWPsd9Exnmzn7drthOZZw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3