Query always behave as a facet filter?


(tuneladora) #1

Hi!

I just noticed that every time I ask for a facet my whole query
behaves like if it were a facet_filer, calculating first the facets
for the whole index and using the query to filter the results instead
of the other way around. I'm not sure if this is an issue, or simply
the way ES works.

I'm made the following experiment on a 10 million doc index:

first I ask for documents with impossible dates

curl -XPOST 'http://localhost:9200/ng_test_10m/_search?
search_type=count&pretty=1' -d'
{
"size": 0,
"query": {
"range": {
"date": {
"from": "2011-07-01T13:00:00Z",
"to": "2011-06-30T14:30:00Z"
}
}
}
}
'
it comes back in 2 milliseconds with no results

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
}
}

but if I take the same query and a facet to some field ( values are an
array of integers )

curl -XPOST 'http://localhost:9200/ng_test_10m/_search?
search_type=count&pretty=1' -d'
{
"size": 0,
"query": {
"range": {
"date": {
"from": "2011-07-01T13:00:00Z",
"to": "2011-06-30T14:30:00Z"
}
}
},
"facets": {
"categories": {
"terms": {
"field": "categories"
}
}
}
}
'

it takes 20 seconds to come back without results

{
"took" : 19669,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"facets" : {
"categories" : {
"_type" : "terms",
"missing" : 0,
"terms" : [ ]
}
}
}

I don't know if I should open an issue or this is the expected
behavior, maybe to make the change of scope easy?
It is a performance issue when facet performance degrades linearly as
the index grows independently from the number of documents you want to
look at.

thanks


(Shay Banon) #2

Facets will only be computed on the docs that match the query. In your case,
if the query did not match any docs, then no computation will be required.
Note, the first invocation to compute facets can take some time to load them
to memory, is the second / third (if you have replicas) is still slow?

On Wed, Jul 13, 2011 at 6:11 PM, tuneladora bictorman@gmail.com wrote:

Hi!

I just noticed that every time I ask for a facet my whole query
behaves like if it were a facet_filer, calculating first the facets
for the whole index and using the query to filter the results instead
of the other way around. I'm not sure if this is an issue, or simply
the way ES works.

I'm made the following experiment on a 10 million doc index:

first I ask for documents with impossible dates

curl -XPOST 'http://localhost:9200/ng_test_10m/_search?
search_type=count&pretty=1' -d'
{
"size": 0,
"query": {
"range": {
"date": {
"from": "2011-07-01T13:00:00Z",
"to": "2011-06-30T14:30:00Z"
}
}
}
}
'
it comes back in 2 milliseconds with no results

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
}
}

but if I take the same query and a facet to some field ( values are an
array of integers )

curl -XPOST 'http://localhost:9200/ng_test_10m/_search?
search_type=count&pretty=1' -d'
{
"size": 0,
"query": {
"range": {
"date": {
"from": "2011-07-01T13:00:00Z",
"to": "2011-06-30T14:30:00Z"
}
}
},
"facets": {
"categories": {
"terms": {
"field": "categories"
}
}
}
}
'

it takes 20 seconds to come back without results

{
"took" : 19669,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"facets" : {
"categories" : {
"_type" : "terms",
"missing" : 0,
"terms" : [ ]
}
}
}

I don't know if I should open an issue or this is the expected
behavior, maybe to make the change of scope easy?
It is a performance issue when facet performance degrades linearly as
the index grows independently from the number of documents you want to
look at.

thanks


(system) #3