Facet Filter limit doesn't improve faceting performance?

Hello,

While doing performance testing that involves faceting on a high
cardinality field
(see https://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ
) we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":{"limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or at
least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

It will not affect things if the problem you have is not the actual facet
calculation, but constant loading of the facet data. Are you indexing data
while running the facet requests? Is the node under load? Do you have any
special configuration to the field data?

On Thu, May 24, 2012 at 12:15 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hello,

While doing performance testing that involves faceting on a high
cardinality field (see
https://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ )
we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":{"limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or at
least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

Hi,

On Friday, May 25, 2012 4:42:44 PM UTC-4, kimchy wrote:

It will not affect things if the problem you have is not the actual facet
calculation, but constant loading of the facet data.

By "constant loading of facet data" do you mean reading docs/field values
from new index segments that are being created due to new documents being
added?

Are you indexing data while running the facet requests? Is the node under
load? Do you have any special configuration to the field data?

Yes, indexing was happening while search performance tests were running.
These nodes have 16 CPU cores, the load was under 16 and CPUs were not 100%
utilized.
I did not test things with indexing turned off, but that would also not be
realistic because I need to see new docs pretty quickly after they are
indexed and indexing has to be constantly running

Any special configuration to the field data....? Hm, I don't think so.

Thanks,
Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thu, May 24, 2012 at 12:15 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hello,

While doing performance testing that involves faceting on a high
cardinality field (see
https://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ )
we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":{"limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or at
least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Sun, May 27, 2012 at 4:24 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

On Friday, May 25, 2012 4:42:44 PM UTC-4, kimchy wrote:

It will not affect things if the problem you have is not the actual facet
calculation, but constant loading of the facet data.

By "constant loading of facet data" do you mean reading docs/field values
from new index segments that are being created due to new documents being
added?

Either that, or configuring the field cache with LRU based / soft based
configuration, which can cause data to be reloaded each time.

Are you indexing data while running the facet requests? Is the node under
load? Do you have any special configuration to the field data?

Yes, indexing was happening while search performance tests were running.
These nodes have 16 CPU cores, the load was under 16 and CPUs were not
100% utilized.
I did not test things with indexing turned off, but that would also not be
realistic because I need to see new docs pretty quickly after they are
indexed and indexing has to be constantly running

If new data is being indexed, and a refresh interval of 1 second, it means
that for new docs (segments) the field data needs to be loaded (and all
relevant search requests wait for it to be loaded). In 0.20 we will have a
warmer API that allows to pre-warm new segments, meaning that search
requests will not "suffer" loading the data.

Any special configuration to the field data....? Hm, I don't think so.

Thanks,
Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thu, May 24, 2012 at 12:15 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hello,

While doing performance testing that involves faceting on a high
cardinality field (see https://groups.google.*com/d/msg/elasticsearch/
*ePJgCtBpyrs/39pzczoRokoJhttps://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ) we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":
{"limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or
at least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

Hi,

On Tuesday, May 29, 2012 2:36:45 PM UTC-4, kimchy wrote:

On Sun, May 27, 2012 at 4:24 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

On Friday, May 25, 2012 4:42:44 PM UTC-4, kimchy wrote:

It will not affect things if the problem you have is not the actual
facet calculation, but constant loading of the facet data.

By "constant loading of facet data" do you mean reading docs/field values
from new index segments that are being created due to new documents being
added?

Either that, or configuring the field cache with LRU based / soft based
configuration, which can cause data to be reloaded each time.

Aha. I think we used soft during our tests.
By "reloaded each time" I assume you mean "reloaded from index each time
data is needed and is not in the cache".
That said, I don't think we saw any cache loading activity in SPM.... we'll
need to recheck to be sure, thanks for this pointer.

Are you indexing data while running the facet requests? Is the node under

load? Do you have any special configuration to the field data?

Yes, indexing was happening while search performance tests were running.
These nodes have 16 CPU cores, the load was under 16 and CPUs were not
100% utilized.
I did not test things with indexing turned off, but that would also not
be realistic because I need to see new docs pretty quickly after they are
indexed and indexing has to be constantly running

If new data is being indexed, and a refresh interval of 1 second, it means
that for new docs (segments) the field data needs to be loaded (and all
relevant search requests wait for it to be loaded). In 0.20 we will have a
warmer API that allows to pre-warm new segments, meaning that search
requests will not "suffer" loading the data.

So with regards to the original subject, your main point here is that
indexing while doing performance tests and having a low index refresh
interval are probably having such a dominant and negative effect on
performance that using Facet Filter limit doesn't actually show any
positive effects?

Thanks,
Otis

Any special configuration to the field data....? Hm, I don't think so.

Thanks,
Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thu, May 24, 2012 at 12:15 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hello,

While doing performance testing that involves faceting on a high
cardinality field (see https://groups.google.**com/d/msg/elasticsearch/
**ePJgCtBpyrs/39pzczoRokoJhttps://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ) we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":
{"limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or
at least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tue, May 29, 2012 at 9:18 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

On Tuesday, May 29, 2012 2:36:45 PM UTC-4, kimchy wrote:

On Sun, May 27, 2012 at 4:24 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

On Friday, May 25, 2012 4:42:44 PM UTC-4, kimchy wrote:

It will not affect things if the problem you have is not the actual
facet calculation, but constant loading of the facet data.

By "constant loading of facet data" do you mean reading docs/field
values from new index segments that are being created due to new documents
being added?

Either that, or configuring the field cache with LRU based / soft based
configuration, which can cause data to be reloaded each time.

Aha. I think we used soft during our tests.
By "reloaded each time" I assume you mean "reloaded from index each time
data is needed and is not in the cache".
That said, I don't think we saw any cache loading activity in SPM....
we'll need to recheck to be sure, thanks for this pointer.

Are you indexing data while running the facet requests? Is the node under

load? Do you have any special configuration to the field data?

Yes, indexing was happening while search performance tests were running.
These nodes have 16 CPU cores, the load was under 16 and CPUs were not
100% utilized.
I did not test things with indexing turned off, but that would also not
be realistic because I need to see new docs pretty quickly after they are
indexed and indexing has to be constantly running

If new data is being indexed, and a refresh interval of 1 second, it
means that for new docs (segments) the field data needs to be loaded (and
all relevant search requests wait for it to be loaded). In 0.20 we will
have a warmer API that allows to pre-warm new segments, meaning that search
requests will not "suffer" loading the data.

So with regards to the original subject, your main point here is that
indexing while doing performance tests and having a low index refresh
interval are probably having such a dominant and negative effect on
performance that using Facet Filter limit doesn't actually show any
positive effects?

Yes, the time is spent on loading the data by the actual search requests.

Thanks,
Otis

Any special configuration to the field data....? Hm, I don't think so.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Thu, May 24, 2012 at 12:15 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hello,

While doing performance testing that involves faceting on a high
cardinality field (see https://groups.google.com
/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJhttps://groups.google.com/d/msg/elasticsearch/ePJgCtBpyrs/39pzczoRokoJ) we tried using the Facet Filter limit like so:

"facets":{"tags":{"terms":{"field":"tags"},"facet_filter":{"
limit":{"value":100}}}}}

To our surprise this had no effect on performance. Is this normal, or
at least expected for a field like "tags" which, as you'd suspect has a
variable number of tokens per field and a high number of distinct tokens
across the whole index.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**ana**lytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.****
html http://sematext.com/spm/index.html