Inconsistent facet count


(stratospark) #1

Hi,

I'm trying to get a drill down searching interface going, with updated
facet counts as you drill down. I'm currently mapping all my strings
as multi_field, with the non-analyzed string value going into
"attribute.orig" and the analyzed one going into plain "attribute"

Here are two sets of requests/responses. The first one I want to get
the top location_name count facet, which returns 17. When I narrow
down the query to only search for location_name.orig "49296", the
facet count increases to 24.

I've tried applying a location_name.orig filter to the match_all
query, but that results in the same thing.

Am I doing something wrong?

Thanks for the help!
-pat

curl -XPOST http://127.0.0.1:9200/_search -d '
{
"query": {
"match_all": {}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":41526,
"max_score":1.0,
"hits":[

  ]

},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":8,
"terms":[
{
"term":"49296",
"count":17
}
]
}
}
}

curl -XPOST http://127.0.0.1:9200/_search -d '
{
"query": {
"term": {
"location.location_name.orig": "49296"
}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":24,
"max_score":8.651358,
"hits":[

  ]

},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":0,
"terms":[
{
"term":"49296",
"count":24
}
]
}
}
}


(Shay Banon) #2

Can you gist a recreation where you also index sample docs?
On Tuesday, May 24, 2011 at 1:53 AM, stratospark wrote:

Hi,

I'm trying to get a drill down searching interface going, with updated
facet counts as you drill down. I'm currently mapping all my strings
as multi_field, with the non-analyzed string value going into
"attribute.orig" and the analyzed one going into plain "attribute"

Here are two sets of requests/responses. The first one I want to get
the top location_name count facet, which returns 17. When I narrow
down the query to only search for location_name.orig "49296", the
facet count increases to 24.

I've tried applying a location_name.orig filter to the match_all
query, but that results in the same thing.

Am I doing something wrong?

Thanks for the help!
-pat

curl -XPOST http://127.0.0.1:9200/_search -d '
{
"query": {
"match_all": {}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":41526,
"max_score":1.0,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":8,
"terms":[
{
"term":"49296",
"count":17
}
]
}
}
}

curl -XPOST http://127.0.0.1:9200/_search -d '
{
"query": {
"term": {
"location.location_name.orig": "49296"
}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":24,
"max_score":8.651358,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":0,
"terms":[
{
"term":"49296",
"count":24
}
]
}
}
}


(stratospark) #3

Create a simplified sample and the results are even stranger.

I also noticed it wasn't picking up my "include_in_all" option in the
dynamic template, which I think I need to generate facets on
multifields?

Thanks so much for the help!

-pat

On May 23, 3:55 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you gist a recreation where you also index sample docs?

On Tuesday, May 24, 2011 at 1:53 AM, stratospark wrote:

Hi,

I'm trying to get a drill down searching interface going, with updated
facet counts as you drill down. I'm currently mapping all my strings
as multi_field, with the non-analyzed string value going into
"attribute.orig" and the analyzed one going into plain "attribute"

Here are two sets of requests/responses. The first one I want to get
the top location_name count facet, which returns 17. When I narrow
down the query to only search for location_name.orig "49296", the
facet count increases to 24.

I've tried applying a location_name.orig filter to the match_all
query, but that results in the same thing.

Am I doing something wrong?

Thanks for the help!
-pat

curl -XPOSThttp://127.0.0.1:9200/_search-d '
{
"query": {
"match_all": {}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":41526,
"max_score":1.0,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":8,
"terms":[
{
"term":"49296",
"count":17
}
]
}
}
}

curl -XPOSThttp://127.0.0.1:9200/_search-d '
{
"query": {
"term": {
"location.location_name.orig": "49296"
}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":24,
"max_score":8.651358,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":0,
"terms":[
{
"term":"49296",
"count":24
}
]
}
}
}


(Shay Banon) #4

This happens because of the way the distributed facet calculation works. It gets the top 5 from each shard, and then aggregates it. Because you have an even distribution of terms, it will not return exact matches. if you increase the size, you will get better results. One possible value can be the size times the number of shards, for example: 25.
On Tuesday, May 24, 2011 at 4:52 AM, stratospark wrote:

Create a simplified sample and the results are even stranger.

I also noticed it wasn't picking up my "include_in_all" option in the
dynamic template, which I think I need to generate facets on
multifields?

https://gist.github.com/987814

Thanks so much for the help!

-pat

On May 23, 3:55 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you gist a recreation where you also index sample docs?

On Tuesday, May 24, 2011 at 1:53 AM, stratospark wrote:

Hi,

I'm trying to get a drill down searching interface going, with updated
facet counts as you drill down. I'm currently mapping all my strings
as multi_field, with the non-analyzed string value going into
"attribute.orig" and the analyzed one going into plain "attribute"

Here are two sets of requests/responses. The first one I want to get
the top location_name count facet, which returns 17. When I narrow
down the query to only search for location_name.orig "49296", the
facet count increases to 24.

I've tried applying a location_name.orig filter to the match_all
query, but that results in the same thing.

Am I doing something wrong?

Thanks for the help!
-pat

curl -XPOSThttp://127.0.0.1:9200/_search-d '
{
"query": {
"match_all": {}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":41526,
"max_score":1.0,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":8,
"terms":[
{
"term":"49296",
"count":17
}
]
}
}
}

curl -XPOSThttp://127.0.0.1:9200/_search-d '
{
"query": {
"term": {
"location.location_name.orig": "49296"
}
},
"facets": {
"location.location_name": {
"terms": {
"field": "location.location_name.orig",
"size": 1
}
}
},
"size": 0
}'

{
"took":12,
"timed_out":false,
"_shards":{
"total":20,
"successful":20,
"failed":0
},
"hits":{
"total":24,
"max_score":8.651358,
"hits":[

]
},
"facets":{
"location.location_name":{
"_type":"terms",
"missing":0,
"terms":[
{
"term":"49296",
"count":24
}
]
}
}
}


(system) #5