Terms facet explodes memory

Jurgen_kartnaller · August 16, 2011, 2:15pm

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES
instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

kimchy · August 17, 2011, 1:57am

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

Jurgen_kartnaller · August 17, 2011, 5:26am

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

kimchy · August 17, 2011, 11:45am

Yea :). Though, I do want to try and allow for other "cache" mechanism that
would allow not to have all values in memory, but still have good perf when
doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Jurgen_kartnaller · August 17, 2011, 5:45pm

Thanks, Shay

We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.

We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.

On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism that
would allow not to have all values in memory, but still have good perf when
doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Stephane_Raux · September 30, 2011, 8:19am

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).
For my understanding the facet should only read fields contained in the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Jurgen_kartnaller · September 30, 2011, 9:01am

To solve this problem we now have our own facet implementations which is not
using the field cache.

For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.

We did this only for fields containing strings, still using the cache for
date and numerical fields.

Jürgen

On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.comwrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in
each facet). The reason for that is performance, you don't want to go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).
For my understanding the facet should only read fields contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Stephane_Raux · September 30, 2011, 9:14am

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is not
using the field cache.
For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in
each facet). The reason for that is performance, you don't want to go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Jurgen_kartnaller · September 30, 2011, 12:57pm

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.

I will try to make it public if I find the time.

Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.comwrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is
not
using the field cache.
For us this is possible because we always have a small query result set
as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <stephane.raux@gmail.com

wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want to
go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained
in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Stephane_Raux · October 2, 2011, 6:38pm

Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is not
using the field cache.
For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in
each facet). The reason for that is performance, you don't want to go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

kimchy · October 2, 2011, 10:13pm

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.comwrote:

Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result set
as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <
stephane.raux@gmail.com>
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained
in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Jurgen_kartnaller · October 3, 2011, 5:09am

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kimchy@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.comwrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <
stephane.raux@gmail.com>
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

Stephane_Raux · October 3, 2011, 9:52am

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnaller@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kimchy@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.com
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

andym · November 4, 2011, 4:18pm

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:Allow to specify a specific field in the clear cache API · Issue #1374 · elastic/elasticsearch · GitHub

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.r...@gmail.com
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.r...@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kim...@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kim...@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at- Hide quoted text -

Show quoted text -

Jurgen_kartnaller · November 4, 2011, 7:10pm

On Fri, Nov 4, 2011 at 5:18 PM, andym imwellnow@gmail.com wrote:

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

No, the facet is always pulling the full index of the field into memory.

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:
Allow to specify a specific field in the clear cache API · Issue #1374 · elastic/elasticsearch · GitHub

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open
an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either
by
going to stored fields, or by extracting the stored source, parsing
it, and
fetching the value. Thats going to be expensive, but for a small
result set,
it might make sense. You can actually do that (for some facets) by
using the
script option, since you can do both _source.obj.field (loads source
and
parse it automatically) or _fields.field_name (fetches a stored
field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <
stephane.r...@gmail.com>
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory
taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <
stephane.r...@gmail.com>
wrote:

It seems be be a good solution for my use case, I am also doing
facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere
?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations
which
is not
using the field cache.
For us this is possible because we always have a small query
result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the
cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term
facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the
fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see
tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <kim...@gmail.com

wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still
have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <
kim...@gmail.com>
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't
want
to go
to
disk for each hit you potentially have in order to fetch
the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster
(need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at- Hide quoted text -

Show quoted text -

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

andym · December 3, 2011, 5:04pm

Hi Jürgen,

Is there a chance you can post somewhere your facet implementation
that is not using the field cache (and if the code is not “release
ready”, it’s perfectly OK, as otherwise I will probably end up doing
similar work to what you have already done) – in one of the scenarios
we’ll have a return set that is rather small, few thousand items, so
it’s should give us very reasonable performance with retrieving data
from stored fields directly.

Alternatively what is the way to aggregate values from stored fields
through scripting as Shay suggests? From the docs I see I can retrieve
particular stored field value (Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/api/search/script-fields.html) but I could not find any
example of how I aggregate these values against the returned document
set

Thanks!

-- Andy

On Nov 4, 2:10 pm, Jürgen kartnaller juergen.kartnal...@gmail.com
wrote:

On Fri, Nov 4, 2011 at 5:18 PM, andym imwell...@gmail.com wrote:

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

No, the facet is always pulling the full index of the field into memory.

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:
Allow to specify a specific field in the clear cache API · Issue #1374 · elastic/elasticsearch · GitHub

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open
an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either
by
going to stored fields, or by extracting the stored source, parsing
it, and
fetching the value. Thats going to be expensive, but for a small
result set,
it might make sense. You can actually do that (for some facets) by
using the
script option, since you can do both _source.obj.field (loads source
and
parse it automatically) or _fields.field_name (fetches a stored
field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <
stephane.r...@gmail.com>
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory
taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <
stephane.r...@gmail.com>
wrote:

It seems be be a good solution for my use case, I am also doing
facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere
?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations
which
is not
using the field cache.
For us this is possible because we always have a small query
result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the
cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term
facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the
fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see
tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <kim...@gmail.com

wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still
have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <
kim...@gmail.com>
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't
want
to go
to
disk for each hit you potentially have in order to fetch
the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster
(need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at-Hide quoted text -

Show quoted text -

--http://www.sfgdornbirn.athttp://www.mcb-bregenz.at- Hide quoted text -

Show quoted text -- Hide quoted text -

Show quoted text -

Topic		Replies	Views
How does the memory usage for terms facets work? Elasticsearch	7	422	July 6, 2017
JVm goes out of memory when I am using facets Elasticsearch	8	375	July 6, 2017
Estimating field cache size for facets in advance Elasticsearch	11	474	July 6, 2017
Running term_stats facets on nested mapping objects: out of memory error Elasticsearch	4	264	July 6, 2017
Yet another facet/memory question Elasticsearch	2	348	July 6, 2017

Terms facet explodes memory

Related topics