Documentation for node level caching and a few caching related questions


(ppearcy) #1

Hi,
Came across this feature:

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

For example, in the sample query here:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/and_filter/

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {
'filters' : [
{ 'term' : {'data' : 600} },
{ 'term' : {'symbol' : 'msft'} },
{ 'or' : {
'filters' : [
{'term' : { 'Language' : 'en' } },
{'term' : { 'Language' : 'fr' } }
]}}
]
}
}
}
}
}

Thanks!
Paul


(Shay Banon) #2

On Wed, Jul 28, 2010 at 8:00 AM, Paul ppearcy@gmail.com wrote:

Hi,
Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory. Actually, I
have not documented that change yet... . It basically pre allocated memory
for the index to use. Now, this pre-allocated memory is shared between
shards.

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

There are different caches for different aspects in elasticsearch. The above
cache only relates to storing the index in memory.

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you should
really care about it. If you are familiar with Lucene, then filters are
cached on an IndexReader level.

For example, in the sample query here:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/and_filter/

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of
AndFilter, only the inner filters are cached. The result is not cached as it
make little sense with the And filter implementation to cache them.

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {
'filters' :
[
{
'term' : {'data' : 600} },
{
'term' : {'symbol' : 'msft'} },
{
'or' : {

'filters' : [

       {'term' : { 'Language' : 'en' } },

       {'term' : { 'Language' : 'fr' } }

       ]}}
                                                                   ]
                                                   }
                                           }
                                   }
                           }
                   }

Thanks!
Paul


(ppearcy) #3

Hi Shay,
Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wed, Jul 28, 2010 at 8:00 AM, Paul ppea...@gmail.com wrote:

Hi,
Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory. Actually, I
have not documented that change yet... . It basically pre allocated memory
for the index to use. Now, this pre-allocated memory is shared between
shards.

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

There are different caches for different aspects in elasticsearch. The above
cache only relates to storing the index in memory.

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you should
really care about it. If you are familiar with Lucene, then filters are
cached on an IndexReader level.

For example, in the sample query here:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an...

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of
AndFilter, only the inner filters are cached. The result is not cached as it
make little sense with the And filter implementation to cache them.

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {
'filters' :
[
{
'term' : {'data' : 600} },
{
'term' : {'symbol' : 'msft'} },
{
'or' : {

'filters' : [

       {'term' : { 'Language' : 'en' } },
       {'term' : { 'Language' : 'fr' } }
       ]}}
                                                                   ]
                                                   }
                                           }
                                   }
                           }
                   }

Thanks!
Paul


(Shay Banon) #4

In elasticsearch, there are two more cached, the first is the filter cache,
and the second is the what I call field data cache (field data is used when
sorting, faceting, or using scripts). It uses JVM capabilities to cache, so
its not like an LRU where you would configure the number of cache entries,
eviction strategy and so on. The only option is to disable it, or choose
between weak and soft cache.

-shay.banon

On Fri, Jul 30, 2010 at 3:39 AM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wed, Jul 28, 2010 at 8:00 AM, Paul ppea...@gmail.com wrote:

Hi,
Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory.
Actually, I
have not documented that change yet... . It basically pre allocated
memory
for the index to use. Now, this pre-allocated memory is shared between
shards.

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

There are different caches for different aspects in elasticsearch. The
above
cache only relates to storing the index in memory.

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you should
really care about it. If you are familiar with Lucene, then filters are
cached on an IndexReader level.

For example, in the sample query here:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an.
..

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of
AndFilter, only the inner filters are cached. The result is not cached as
it
make little sense with the And filter implementation to cache them.

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {

'filters' :

[

{

'term' : {'data' : 600} },

{

'term' : {'symbol' : 'msft'} },

{

'or' : {

'filters' : [

       {'term' : { 'Language' : 'en' } },
       {'term' : { 'Language' : 'fr' } }
       ]}}

]

                                                   }
                                           }
                                   }
                           }
                   }

Thanks!
Paul


(Otis Gospodnetić) #5

Hi Shay,

On Jul 30, 7:24 am, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, there are two more cached, the first is the filter cache,
and the second is the what I call field data cache (field data is used when
sorting, faceting, or using scripts). It uses JVM capabilities to cache, so
its not like an LRU where you would configure the number of cache entries,
eviction strategy and so on. The only option is to disable it, or choose
between weak and soft cache.

I imagine the filter cache is actually ES-specific code. Is the field
data cache also ES-specific, or are you referring to Lucene's
FieldCache?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

-shay.banon

On Fri, Jul 30, 2010 at 3:39 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wed, Jul 28, 2010 at 8:00 AM, Paul ppea...@gmail.com wrote:

Hi,
Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory.
Actually, I
have not documented that change yet... . It basically pre allocated
memory
for the index to use. Now, this pre-allocated memory is shared between
shards.

But was not able to find it reflected in the docs. I figured it would
be in the Node settings.

Also, does this cache hold data for filter queries or other items, as
well?

There are different caches for different aspects in elasticsearch. The
above
cache only relates to storing the index in memory.

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you should
really care about it. If you are familiar with Lucene, then filters are
cached on an IndexReader level.

For example, in the sample query here:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an.
..

Are there two cache objects created for each filter (separate date and
name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of
AndFilter, only the inner filters are cached. The result is not cached as
it
make little sense with the And filter implementation to cache them.

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {

'filters' :

[

{

'term' : {'data' : 600} },

{

'term' : {'symbol' : 'msft'} },

{

'or' : {

'filters' : [

       {'term' : { 'Language' : 'en' } },
       {'term' : { 'Language' : 'fr' } }
       ]}}

]

                                                   }
                                           }
                                   }
                           }
                   }

Thanks!
Paul


(Shay Banon) #6

The field data cache is elasticsearch specific, it replaces Lucene
FieldCache, but serves similar purpose (with extended functionality) with
the ability to use it for other cases like facets and scripts. I tried to
get some of the mentioned enhancements to Lucene but got pushed back (like
using concurrent soft map).

-shay.banon

On Fri, Jul 30, 2010 at 6:11 PM, Otis otis.gospodnetic@gmail.com wrote:

Hi Shay,

On Jul 30, 7:24 am, Shay Banon shay.ba...@elasticsearch.com wrote:

In elasticsearch, there are two more cached, the first is the filter
cache,
and the second is the what I call field data cache (field data is used
when
sorting, faceting, or using scripts). It uses JVM capabilities to cache,
so
its not like an LRU where you would configure the number of cache
entries,
eviction strategy and so on. The only option is to disable it, or choose
between weak and soft cache.

I imagine the filter cache is actually ES-specific code. Is the field
data cache also ES-specific, or are you referring to Lucene's
FieldCache?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

-shay.banon

On Fri, Jul 30, 2010 at 3:39 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Many thanks for the details.

If using FS based index storage, are there any caching settings
available?

Best Regards,
Paul

On Jul 27, 11:57 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On Wed, Jul 28, 2010 at 8:00 AM, Paul ppea...@gmail.com wrote:

Hi,
Came across this feature:
http://github.com/elasticsearch/elasticsearch/issues/issue/235

This only relates to a case where you store the index in memory.
Actually, I
have not documented that change yet... . It basically pre allocated
memory
for the index to use. Now, this pre-allocated memory is shared
between

shards.

But was not able to find it reflected in the docs. I figured it
would

be in the Node settings.

Also, does this cache hold data for filter queries or other items,
as

well?

There are different caches for different aspects in elasticsearch.
The

above

cache only relates to storing the index in memory.

While on the topic of caching, I was curious, at what level of
granularity are filter queries cached?

Thats a bit complicated when it comes to Lucene. In general, you
should

really care about it. If you are familiar with Lucene, then filters
are

cached on an IndexReader level.

For example, in the sample query here:

http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/an.

..

Are there two cache objects created for each filter (separate date
and

name caches) or one cache that holds the results of everything
specified in filters?

It depends on the filter / query (that accepts a filter). In case of
AndFilter, only the inner filters are cached. The result is not
cached as

it

make little sense with the And filter implementation to cache them.

How does this work in the case of nested filters? For the example
below, I'd guess one cached item for each filters grouping, so two
cached items one for the main filters and the other for the sub-
filters.

{
'query' : {
'constant_score' : {
'filter' : {
'and' : {

'filters' :

[

{

'term' : {'data' : 600} },

{

'term' : {'symbol' : 'msft'} },

{

'or' : {

'filters' : [

       {'term' : { 'Language' : 'en' } },
       {'term' : { 'Language' : 'fr' } }
       ]}}

]

                                                   }
                                           }
                                   }
                           }
                   }

Thanks!
Paul


(system) #7