Facets only when number of hits is on given range


(Ludovic Levesque) #1

Hi Shay,

I don't know how faceting works internally, but will it be possible to
gather facets only when number of hits is less than a given threshold
?

Something like this for example:

"facets": {
"facet1": {
"terms": {
"field": "field1",
"size": 100
},
"only_when": {
"hits": {"lte": 10000}
}
},
"facet2": {
"terms": {
"field": "field2",
"size": 10
}
}
}

Because on some fields, computing facets introduces a little overhead
even if we don't want the final results because it's not appropriate
when we have many hits.

We can do this in the application, but if we can avoid some roundtrips
to the cluster, it will be great.

Another idea but maybe more difficult to implement: to be able to
gather some facets on some fields only when number of different terms
of another field is less than a given number:
facets on subcategories only when number of different topcategories
are less than 5 for example or facets on prices only if we have only
one category of products, etc...

Regards,
Ludo


(Shay Banon) #2

The way facets are implemented today is that they are aggregated as part of
the search process, which makes them really fast with low overhead, but does
mean that you can't use data that is only available post search (like the
number of hits).

-shay.banon

On Tue, Nov 16, 2010 at 11:50 AM, Ludovic Levesque luddic@gmail.com wrote:

Hi Shay,

I don't know how faceting works internally, but will it be possible to
gather facets only when number of hits is less than a given threshold
?

Something like this for example:

"facets": {
"facet1": {
"terms": {
"field": "field1",
"size": 100
},
"only_when": {
"hits": {"lte": 10000}
}
},
"facet2": {
"terms": {
"field": "field2",
"size": 10
}
}
}

Because on some fields, computing facets introduces a little overhead
even if we don't want the final results because it's not appropriate
when we have many hits.

We can do this in the application, but if we can avoid some roundtrips
to the cluster, it will be great.

Another idea but maybe more difficult to implement: to be able to
gather some facets on some fields only when number of different terms
of another field is less than a given number:
facets on subcategories only when number of different topcategories
are less than 5 for example or facets on prices only if we have only
one category of products, etc...

Regards,
Ludo


(Ludovic Levesque) #3

Ok, so better to do one more roundtrip if really needed

Thanks again

On Wed, Nov 17, 2010 at 11:08 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

The way facets are implemented today is that they are aggregated as part of
the search process, which makes them really fast with low overhead, but does
mean that you can't use data that is only available post search (like the
number of hits).
-shay.banon

On Tue, Nov 16, 2010 at 11:50 AM, Ludovic Levesque luddic@gmail.com wrote:

Hi Shay,

I don't know how faceting works internally, but will it be possible to
gather facets only when number of hits is less than a given threshold
?

Something like this for example:

"facets": {
"facet1": {
"terms": {
"field": "field1",
"size": 100
},
"only_when": {
"hits": {"lte": 10000}
}
},
"facet2": {
"terms": {
"field": "field2",
"size": 10
}
}
}

Because on some fields, computing facets introduces a little overhead
even if we don't want the final results because it's not appropriate
when we have many hits.

We can do this in the application, but if we can avoid some roundtrips
to the cluster, it will be great.

Another idea but maybe more difficult to implement: to be able to
gather some facets on some fields only when number of different terms
of another field is less than a given number:
facets on subcategories only when number of different topcategories
are less than 5 for example or facets on prices only if we have only
one category of products, etc...

Regards,
Ludo


(system) #4