Terms facet explodes memory


(Jürgen kartnaller) #1

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all ES
instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen


(Shay Banon) #2

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen


(Jürgen kartnaller) #3

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Shay Banon) #4

Yea :). Though, I do want to try and allow for other "cache" mechanism that
would allow not to have all values in memory, but still have good perf when
doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into the
field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart all
ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Jürgen kartnaller) #5

Thanks, Shay

We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.

We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.

On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism that
would allow not to have all values in memory, but still have good perf when
doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller <
juergen.kartnaller@gmail.com> wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.

This also happens if the query returns no results for the facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).

For my understanding the facet should only read fields contained in the
result of the query.

Is there a way to avoid this problem?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Stéphane Raux) #6

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented in
each facet). The reason for that is performance, you don't want to go to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).
For my understanding the facet should only read fields contained in the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Jürgen kartnaller) #7

To solve this problem we now have our own facet implementations which is not
using the field cache.

For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.

We did this only for fields containing strings, still using the cache for
date and numerical fields.

Jürgen

On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.comwrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf

when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in

each facet). The reason for that is performance, you don't want to go
to

disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to restart
all ES instances).
For my understanding the facet should only read fields contained in
the

result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Stéphane Raux) #8

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is not
using the field cache.
For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in
each facet). The reason for that is performance, you don't want to go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Jürgen kartnaller) #9

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.

I will try to make it public if I find the time.

Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.comwrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is
not
using the field cache.
For us this is possible because we always have a small query result set
as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <stephane.raux@gmail.com

wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism

that would allow not to have all values in memory, but still have
good

perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented

in
each facet). The reason for that is performance, you don't want to
go

to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained
in

the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Stéphane Raux) #10

Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which is not
using the field cache.
For us this is possible because we always have a small query result set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field cache.
We did this only for fields containing strings, still using the cache for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how it
works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com wrote:

Yea :). Though, I do want to try and allow for other "cache" mechanism
that would allow not to have all values in memory, but still have good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com wrote:

Facets cause fields to be completely loaded to memory (its documented
in
each facet). The reason for that is performance, you don't want to go
to
disk for each hit you potentially have in order to fetch the value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Shay Banon) #11

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.comwrote:

Thank you for the plugin, I hope you will find some time to make it public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not

using the field cache.
For us this is possible because we always have a small query result set
as

input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.

We did this only for fields containing strings, still using the cache
for

date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <
stephane.raux@gmail.com>

wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields ?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it

works.
We will have 5.5T documents, as a start and will have a lot of facet
queries. We also implement our own specific facets
to fulfill customer

requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism

that would allow not to have all values in memory, but still have
good

perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented

in
each facet). The reason for that is performance, you don't want
to go

to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields contained
in

the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Jürgen kartnaller) #12

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kimchy@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.

Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.comwrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not

using the field cache.
For us this is possible because we always have a small query result
set as

input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.

We did this only for fields containing strings, still using the cache
for

date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux <
stephane.raux@gmail.com>

wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a term
facet, the memory is never released, so if I do several term facets
on

several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it

works.
We will have 5.5T documents, as a start and will have a lot of
facet

queries. We also implement our own specific facets
to fulfill customer

requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism

that would allow not to have all values in memory, but still have
good

perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented

in
each facet). The reason for that is performance, you don't want
to go

to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents

into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.

In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in

the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(Stéphane Raux) #13

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnaller@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kimchy@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.raux@gmail.com
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnaller@gmail.com
Date: 2011/9/30
Subject: Re: terms facet explodes memory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.raux@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnaller@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.raux@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnaller@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kimchy@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kimchy@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnaller@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(andym) #14

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:https://github.com/elasticsearch/elasticsearch/issues/1374

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open an
issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either by
going to stored fields, or by extracting the stored source, parsing it, and
fetching the value. Thats going to be expensive, but for a small result set,
it might make sense. You can actually do that (for some facets) by using the
script option, since you can do both _source.obj.field (loads source and
parse it automatically) or _fields.field_name (fetches a stored field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux stephane.r...@gmail.com
wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory taken
by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux stephane.r...@gmail.com
wrote:

It seems be be a good solution for my use case, I am also doing facets
with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere ?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations which
is not
using the field cache.
For us this is possible because we always have a small query result
set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the cache
for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term facets
on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the fields
?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see tomorrow how
it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon kim...@gmail.com
wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still have
good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon kim...@gmail.com
wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't want
to go
to
disk for each hit you potentially have in order to fetch the
value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster (need to
restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at- Hide quoted text -

  • Show quoted text -

(Jürgen kartnaller) #15

On Fri, Nov 4, 2011 at 5:18 PM, andym imwellnow@gmail.com wrote:

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

No, the facet is always pulling the full index of the field into memory.

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1374

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open
an

issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either
by

going to stored fields, or by extracting the stored source, parsing
it, and

fetching the value. Thats going to be expensive, but for a small
result set,

it might make sense. You can actually do that (for some facets) by
using the

script option, since you can do both _source.obj.field (loads source
and

parse it automatically) or _fields.field_name (fetches a stored
field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <
stephane.r...@gmail.com>

wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory
taken

by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <
stephane.r...@gmail.com>

wrote:

It seems be be a good solution for my use case, I am also doing
facets

with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere
?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations
which

is not
using the field cache.
For us this is possible because we always have a small query
result

set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the
cache

for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term
facets

on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the
fields

?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see
tomorrow how

it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <kim...@gmail.com

wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still
have

good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <
kim...@gmail.com>

wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't
want

to go
to
disk for each hit you potentially have in order to fetch
the

value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster
(need to

restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at- Hide quoted text -

  • Show quoted text -

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


(andym) #16

Hi Jürgen,

Is there a chance you can post somewhere your facet implementation
that is not using the field cache (and if the code is not “release
ready”, it’s perfectly OK, as otherwise I will probably end up doing
similar work to what you have already done) – in one of the scenarios
we’ll have a return set that is rather small, few thousand items, so
it’s should give us very reasonable performance with retrieving data
from stored fields directly.

Alternatively what is the way to aggregate values from stored fields
through scripting as Shay suggests? From the docs I see I can retrieve
particular stored field value (http://www.elasticsearch.org/guide/
reference/api/search/script-fields.html) but I could not find any
example of how I aggregate these values against the returned document
set

Thanks!

-- Andy

On Nov 4, 2:10 pm, Jürgen kartnaller juergen.kartnal...@gmail.com
wrote:

On Fri, Nov 4, 2011 at 5:18 PM, andym imwell...@gmail.com wrote:

Hi,
I am running into the same facet / OOM problem. In our case we have
around 7M docs (10G index size with 5 shards, 2 replicas running on 2
m1.large instances) and 7 facets that we actively query against.
Unfortunately the number of elements in one of fields that we facets
on got very large (probably 10s of thousands) and we get OOM.

While with adding one more machine the shards rebalance very nicely (I
know with our config we can get up to 10) and we do not experience OOM
problem, I’d like to explore the possibility of reducing
dimensionality of the facets without re-indexing the whole thing at
the moment – is it possible though scripting (or other means) to
include into facet calculations only those facet terms that match
specific criteria (i.e. consist of one word, or start with a*, etc).
In other words, given query such as

{
"query": {
"query_string": {
"query": "hello"
}
},
"facets": {
"myfacet1": {
"terms": {
"field": "myfacet1",
"size": 50
}
}
}
}

Is it possible to include scripting section in it that would instruct
the facet to load (and cache) only subset of all facets matching a
criteria (i.e. consisting of one word).

No, the facet is always pulling the full index of the field into memory.

Thank you!

-- Andy

On Oct 3, 5:52 am, Stéphane Raux stephane.r...@gmail.com wrote:

My bad, I didn't notice the clear cache API, and the field_data
option. I think it will be enought to solve my problem.

I have opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1374

Thanks,

Stéphane

2011/10/3 Jürgen kartnaller juergen.kartnal...@gmail.com:

On Mon, Oct 3, 2011 at 12:13 AM, Shay Banon kim...@gmail.com wrote:

There is a way to clear the field data cache (there is an API for that
called clear cache), but not specifically for a specific field. Open
an

issue for that one, its a good idea to have it.
Regarding the slower impl, I am guessing that its implemented either
by

going to stored fields, or by extracting the stored source, parsing
it, and

fetching the value. Thats going to be expensive, but for a small
result set,

it might make sense. You can actually do that (for some facets) by
using the

script option, since you can do both _source.obj.field (loads source
and

parse it automatically) or _fields.field_name (fetches a stored
field).

Exactly, I'm doing it on stored fields. Also using it for more complex
custom facets.

On Sun, Oct 2, 2011 at 8:38 PM, Stéphane Raux <
stephane.r...@gmail.com>

wrote:

Thank you for the plugin, I hope you will find some time to make it
public!

Anyway, would it be possible to provide a way to free the memory
taken

by the values of the facets, maybe with an explicit call on a given
field or by providing an optional timeout?

An other solution may be to implement a slower implementation for
requesting facets on small subsets of documents?

Should I open a feature or an issue?

Stéphane

---------- Forwarded message ----------
From: Jürgen kartnaller juergen.kartnal...@gmail.com
Date: 2011/9/30
Subject: Re: terms facetexplodesmemory
To: elasticsearch@googlegroups.com

It is implemented as a plugin but is not yet public available :frowning:
I also made a simple distinct facet, alos for small data sets.
I will try to make it public if I find the time.
Jürgen

On Fri, Sep 30, 2011 at 11:14 AM, Stéphane Raux <
stephane.r...@gmail.com>

wrote:

It seems be be a good solution for my use case, I am also doing
facets

with small subsets of my documents.

Did you implement it with the Java API ? Is it available somewhere
?

Stéphane

2011/9/30 Jürgen kartnaller juergen.kartnal...@gmail.com:

To solve this problem we now have our own facet implementations
which

is not
using the field cache.
For us this is possible because we always have a small query
result

set as
input for the facets.
The query filters about 100k documents out of 8G.
With the 100K docs the facet is still fast enough without a field
cache.
We did this only for fields containing strings, still using the
cache

for
date and numerical fields.
Jürgen
On Fri, Sep 30, 2011 at 10:19 AM, Stéphane Raux
stephane.r...@gmail.com
wrote:

Hi,

I have the same problem.

The point is that once all the fields are loaded in memory for a
term
facet, the memory is never released, so if I do several term
facets

on
several fields, I end up with a OutOfMemoryError.
Would it be possible to provide a mechanism allowing to free the
memory taken by the fields ?
Or to check if the node has enought memory before loading the
fields

?

Stéphane

2011/8/17 Jürgen kartnaller juergen.kartnal...@gmail.com:

Thanks, Shay
We are now using m2.xlarge with 30GB for ES. Will see
tomorrow how

it
works.
We will have 5.5T documents, as a start and will have a lot of
facet
queries. We also implement our own specific facets
to fulfill customer
requirements.
On Wed, Aug 17, 2011 at 1:45 PM, Shay Banon <kim...@gmail.com

wrote:

Yea :). Though, I do want to try and allow for other "cache"
mechanism
that would allow not to have all values in memory, but still
have

good
perf
when doing facets, but its down the road...

On Wed, Aug 17, 2011 at 8:26 AM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

This basically means I need more memory.

On Wed, Aug 17, 2011 at 3:57 AM, Shay Banon <
kim...@gmail.com>

wrote:

Facets cause fields to be completely loaded to memory (its
documented
in
each facet). The reason for that is performance, you don't
want

to go
to
disk for each hit you potentially have in order to fetch
the

value.

On Tue, Aug 16, 2011 at 5:15 PM, Jürgen kartnaller
juergen.kartnal...@gmail.com wrote:

The terms facet seems to read the terms field from ALL
documents
into
the field cache not only the fields from the query result.
This also happens if the query returns no results for the
facet.
In our case this results in :
java.lang.OutOfMemoryError: Java heap space
which then leads into a no longer responding cluster
(need to

restart
all ES instances).
For my understanding the facet should only read fields
contained in
the
result of the query.
Is there a way to avoid this problem?
Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at-Hide quoted text -

  • Show quoted text -

--http://www.sfgdornbirn.athttp://www.mcb-bregenz.at- Hide quoted text -

  • Show quoted text -- Hide quoted text -

  • Show quoted text -


(system) #17