Memory usage per index

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have 90
indices (5 shards, 1 replica), one for a day of data, 500gb of data on each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of increased
memory usage over time. Cluster restart doesn't help. I guess that indices
require some memory, but apparently there is no way to find out how much
memory each shard is using that cannot be freed by GC. Now we have 10-20%
of cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could "park" old
indices that are not used often, kind of automatically open/close for
indices.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and JVM with only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if possible.

Auto closing index does not exist as opening an index comes with a cost, this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold data as you probably have less requests on them and use http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html#shard-allocation-filtering to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibobrik@gmail.com) a écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have 90 indices (5 shards, 1 replica), one for a day of data, 500gb of data on each node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of increased memory usage over time. Cluster restart doesn't help. I guess that indices require some memory, but apparently there is no way to find out how much memory each shard is using that cannot be freed by GC. Now we have 10-20% of cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for indices/shards? Ideally it would be cool if elasticsearch could "park" old indices that are not used often, kind of automatically open/close for indices.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato david@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and JVM with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if possible.

Auto closing index does not exist as opening an index comes with a cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold data as
you probably have less requests on them and use
Elasticsearch Platform — Find real-time answers at scale | Elastic
to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibobrik@gmail.com) a écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have 90
indices (5 shards, 1 replica), one for a day of data, 500gb of data on each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of increased
memory usage over time. Cluster restart doesn't help. I guess that indices
require some memory, but apparently there is no way to find out how much
memory each shard is using that cannot be freed by GC. Now we have 10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could "park" old
indices that are not used often, kind of automatically open/close for
indices.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You're right. It sounds crazy. But Lucene uses a lot the file system cache. If you don't leave enough free RAM for the filesystem cache, Lucene won't be able to use it.
So a common recommendation is to give only half of the RAM to the HEAP.

Applying this won't help to reduce memory pressure for sure.

I guess that as your index can not be fully loaded in HEAP (500 Gb vs 13 Gb) Lucene loads data from the filesystem very often. I guess that filesystem cache should play a role here for better response time.
That said, if you have SSD drives or if response time is not an issue for you, may be keeping HEAP at 13Gb is fine???

Just my 2 cents here.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:45:35, ivan babrou (ibobrik@gmail.com) a écrit:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato david@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and JVM with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if possible.

Auto closing index does not exist as opening an index comes with a cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold data as
you probably have less requests on them and use
Elasticsearch Platform — Find real-time answers at scale | Elastic
to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibobrik@gmail.com) a écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have 90
indices (5 shards, 1 replica), one for a day of data, 500gb of data on each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of increased
memory usage over time. Cluster restart doesn't help. I guess that indices
require some memory, but apparently there is no way to find out how much
memory each shard is using that cannot be freed by GC. Now we have 10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could "park" old
indices that are not used often, kind of automatically open/close for
indices.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Best practice is to use 50% of available RAM for heap size and let the OS
cache with the rest. IO will be low because of the high efficiency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 22 November 2013 19:45, ivan babrou ibobrik@gmail.com wrote:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato david@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and JVM
with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if
possible.

Auto closing index does not exist as opening an index comes with a cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold data
as
you probably have less requests on them and use

Elasticsearch Platform — Find real-time answers at scale | Elastic

to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibobrik@gmail.com) a
écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have 90
indices (5 shards, 1 replica), one for a day of data, 500gb of data on
each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of
increased
memory usage over time. Cluster restart doesn't help. I guess that
indices
require some memory, but apparently there is no way to find out how much
memory each shard is using that cannot be freed by GC. Now we have
10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could "park"
old
indices that are not used often, kind of automatically open/close for
indices.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

IO is low already, I don't see any reason to increase page cache. If
there's no high io, there is no need in big page cache. I thought
lucene could operate in memory using field cache. And field cache
population is so cpu intensive, elasticsearch don't even want to limit
it by default or make it softly referenced to prevent server from
dying (Huge GC load with populated field cache · Issue #3639 · elastic/elasticsearch · GitHub). Am
I missing something about this idea?

I also don't query all indices all the time. Usually people are happy
with the latest data. And if people ask for 1 month of data, page
cache won't help.

I will reindex data to make 1 shard + 1 replica per index (kibana
can't take advantage of routing anyway) and that should be enough to
keep memory pressure low until disks are full.

Having some stats about per-shard memory usage would be great, though.
And some kind of explanation where memory is used, how can it be
effectively used and why.

Just my 2 cents too :slight_smile:

On 22 November 2013 12:58, Mark Walkom markw@campaignmonitor.com wrote:

Best practice is to use 50% of available RAM for heap size and let the OS
cache with the rest. IO will be low because of the high efficiency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 22 November 2013 19:45, ivan babrou ibobrik@gmail.com wrote:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato david@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and JVM
with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if
possible.

Auto closing index does not exist as opening an index comes with a cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold data
as
you probably have less requests on them and use

Elasticsearch Platform — Find real-time answers at scale | Elastic
to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibobrik@gmail.com) a
écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We have
90
indices (5 shards, 1 replica), one for a day of data, 500gb of data on
each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of
increased
memory usage over time. Cluster restart doesn't help. I guess that
indices
require some memory, but apparently there is no way to find out how much
memory each shard is using that cannot be freed by GC. Now we have
10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could "park"
old
indices that are not used often, kind of automatically open/close for
indices.

You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

I skimmed this thread.

The main problem is time taken for GC, right? The "easy" thing to do is to
review your JVM version, its params and watch GC metrics as you change
them. I like G1 :slight_smile:

Per-shard memory stats don't seem to be possible yet.

Use DocValues for fields you sort or facet on, they use less heap - see the
graph comparison
in Presentation: Solr for Analytics - Sematext
(ignore Solr bits - DocValues are at Lucene level)

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Friday, November 22, 2013 4:21:55 AM UTC-5, Ivan Babrou wrote:

IO is low already, I don't see any reason to increase page cache. If
there's no high io, there is no need in big page cache. I thought
lucene could operate in memory using field cache. And field cache
population is so cpu intensive, elasticsearch don't even want to limit
it by default or make it softly referenced to prevent server from
dying (Huge GC load with populated field cache · Issue #3639 · elastic/elasticsearch · GitHubhttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F3639&sa=D&sntz=1&usg=AFQjCNGvuMBuFRjAIWLISwA_Jby1CmdXvg).
Am
I missing something about this idea?

I also don't query all indices all the time. Usually people are happy
with the latest data. And if people ask for 1 month of data, page
cache won't help.

I will reindex data to make 1 shard + 1 replica per index (kibana
can't take advantage of routing anyway) and that should be enough to
keep memory pressure low until disks are full.

Having some stats about per-shard memory usage would be great, though.
And some kind of explanation where memory is used, how can it be
effectively used and why.

Just my 2 cents too :slight_smile:

On 22 November 2013 12:58, Mark Walkom <ma...@campaignmonitor.com<javascript:>>
wrote:

Best practice is to use 50% of available RAM for heap size and let the
OS
cache with the rest. IO will be low because of the high efficiency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.comhttp://www.google.com/url?q=http%3A%2F%2Fwww.campaignmonitor.com&sa=D&sntz=1&usg=AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q

On 22 November 2013 19:45, ivan babrou <ibo...@gmail.com <javascript:>>
wrote:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato <da...@pilato.fr <javascript:>>
wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and
JVM
with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if
possible.

Auto closing index does not exist as opening an index comes with a
cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold
data
as
you probably have less requests on them and use

Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findex-modules-allocation.html%23shard-allocation-filtering&sa=D&sntz=1&usg=AFQjCNGSjDWKjjR6viEe-BY-oo5k-DkfyA

to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibo...@gmail.com<javascript:>)
a
écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We
have
90
indices (5 shards, 1 replica), one for a day of data, 500gb of data
on
each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of
increased
memory usage over time. Cluster restart doesn't help. I guess that
indices
require some memory, but apparently there is no way to find out how
much
memory each shard is using that cannot be freed by GC. Now we have
10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could
"park"
old
indices that are not used often, kind of automatically open/close for
indices.

You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an
email to elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit

https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.

To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.namehttp://www.google.com/url?q=http%3A%2F%2Fbobrik.name&sa=D&sntz=1&usg=AFQjCNHEdkwrzdLmrFtPKbX63-ztUr5BtA
http://twitter.com/ibobrikhttp://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Fibobrik&sa=D&sntz=1&usg=AFQjCNE58bf3QS0-a8WUOqX7j0tWYdgbOgskype:i.babrou

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.

To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.namehttp://www.google.com/url?q=http%3A%2F%2Fbobrik.name&sa=D&sntz=1&usg=AFQjCNHEdkwrzdLmrFtPKbX63-ztUr5BtA
http://twitter.com/ibobrikhttp://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Fibobrik&sa=D&sntz=1&usg=AFQjCNE58bf3QS0-a8WUOqX7j0tWYdgbOgskype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The main problem is shards count, actually. I backed up month of data (150
shards) and was able to increase field cache size by 1.2gb. New indices
will have 1 shard and this should be enough.

GC is not the root of the evil, it's better to fix causes, not consequences
:slight_smile:

On Friday, November 22, 2013, Otis Gospodnetic wrote:

Hi Ivan,

I skimmed this thread.

The main problem is time taken for GC, right? The "easy" thing to do is
to review your JVM version, its params and watch GC metrics as you change
them. I like G1 :slight_smile:

Per-shard memory stats don't seem to be possible yet.

Use DocValues for fields you sort or facet on, they use less heap - see
the graph comparison in
Presentation: Solr for Analytics - Sematext(ignore Solr bits - DocValues are at Lucene level)

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Friday, November 22, 2013 4:21:55 AM UTC-5, Ivan Babrou wrote:

IO is low already, I don't see any reason to increase page cache. If
there's no high io, there is no need in big page cache. I thought
lucene could operate in memory using field cache. And field cache
population is so cpu intensive, elasticsearch don't even want to limit
it by default or make it softly referenced to prevent server from
dying (Huge GC load with populated field cache · Issue #3639 · elastic/elasticsearch · GitHubhttps://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F3639&sa=D&sntz=1&usg=AFQjCNGvuMBuFRjAIWLISwA_Jby1CmdXvg).
Am
I missing something about this idea?

I also don't query all indices all the time. Usually people are happy
with the latest data. And if people ask for 1 month of data, page
cache won't help.

I will reindex data to make 1 shard + 1 replica per index (kibana
can't take advantage of routing anyway) and that should be enough to
keep memory pressure low until disks are full.

Having some stats about per-shard memory usage would be great, though.
And some kind of explanation where memory is used, how can it be
effectively used and why.

Just my 2 cents too :slight_smile:

On 22 November 2013 12:58, Mark Walkom ma...@campaignmonitor.com
wrote:

Best practice is to use 50% of available RAM for heap size and let the
OS
cache with the rest. IO will be low because of the high efficiency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.comhttp://www.google.com/url?q=http%3A%2F%2Fwww.campaignmonitor.com&sa=D&sntz=1&usg=AFQjCNFv30c-WBiP6sfBmxXaWBP5YBZg1Q

On 22 November 2013 19:45, ivan babrou ibo...@gmail.com wrote:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato da...@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and
JVM
with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if
possible.

Auto closing index does not exist as opening an index comes with a
cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold
data
as
you probably have less requests on them and use

Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/index-modules-allocation.html#shard-
allocation-filteringhttp://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Findex-modules-allocation.html%23shard-allocation-filtering&sa=D&sntz=1&usg=AFQjCNGSjDWKjjR6viEe-BY-oo5k-DkfyA
to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibo...@gmail.com) a
écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We
have
90
indices (5 shards, 1 replica), one for a day of data, 500gb of data
on
each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of
increased
memory usage over time. Cluster restart doesn't help. I guess that
indices
require some memory, but apparently there is no way to find out how
much
memory each shard is using that cannot be freed by GC. Now we have
10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could
"park"
old
indices that are not used often, kind of automatically open/close
for
indices.

You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an
email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/
1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.namehttp://www.google.com/url?q=http%3A%2F%2Fbobrik.name&sa=D&sntz=1&usg=AFQjCNHEdkwrzdLmrFtPKbX63-ztUr5BtA
http://twitter.com/ibobrikhttp://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Fibobrik&sa=D&sntz=1&usg=AFQjCNE58bf3QS0-a8WUOqX7j0tWYdgbOgskype:i.babrou

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.

To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.namehttp://www.google.com/url?q=http%3A%2F%2Fbobrik.name&sa=D&sntz=1&usg=AFQjCNHEdkwrzdLmrFtPKbX63-ztUr5BtA
http://twitter.com/ibobrikhttp://www.google.com/url?q=http%3A%2F%2Ftwitter.com%2Fibobrik&sa=D&sntz=1&usg=AFQjCNE58bf3QS0-a8WUOqX7j0tWYdgbOgskype:i.babrou

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com <javascript:_e({}, 'cvml',
'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Final update for anyone who will find this thread in google: update to
0.90.9 and set index.codec.bloom.load=false on every index if you make
daily indices. This reduced "cold" node memory footprint from 7gb to
2gb. Funny thing, amount of shards doesn't matter, smaller amount of
shards only makes queries slower.

On 22 November 2013 23:43, ivan babrou ibobrik@gmail.com wrote:

The main problem is shards count, actually. I backed up month of data (150
shards) and was able to increase field cache size by 1.2gb. New indices will
have 1 shard and this should be enough.

GC is not the root of the evil, it's better to fix causes, not consequences
:slight_smile:

On Friday, November 22, 2013, Otis Gospodnetic wrote:

Hi Ivan,

I skimmed this thread.

The main problem is time taken for GC, right? The "easy" thing to do is
to review your JVM version, its params and watch GC metrics as you change
them. I like G1 :slight_smile:

Per-shard memory stats don't seem to be possible yet.

Use DocValues for fields you sort or facet on, they use less heap - see
the graph comparison in
Presentation: Solr for Analytics - Sematext (ignore
Solr bits - DocValues are at Lucene level)

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Friday, November 22, 2013 4:21:55 AM UTC-5, Ivan Babrou wrote:

IO is low already, I don't see any reason to increase page cache. If
there's no high io, there is no need in big page cache. I thought
lucene could operate in memory using field cache. And field cache
population is so cpu intensive, elasticsearch don't even want to limit
it by default or make it softly referenced to prevent server from
dying (Huge GC load with populated field cache · Issue #3639 · elastic/elasticsearch · GitHub). Am
I missing something about this idea?

I also don't query all indices all the time. Usually people are happy
with the latest data. And if people ask for 1 month of data, page
cache won't help.

I will reindex data to make 1 shard + 1 replica per index (kibana
can't take advantage of routing anyway) and that should be enough to
keep memory pressure low until disks are full.

Having some stats about per-shard memory usage would be great, though.
And some kind of explanation where memory is used, how can it be
effectively used and why.

Just my 2 cents too :slight_smile:

On 22 November 2013 12:58, Mark Walkom ma...@campaignmonitor.com wrote:

Best practice is to use 50% of available RAM for heap size and let the
OS
cache with the rest. IO will be low because of the high efficiency.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 22 November 2013 19:45, ivan babrou ibo...@gmail.com wrote:

Why should I decrease heap size if io saturation is low? And how can I
find out memory usage per lucene instance? I will decrease sharding,
but I'm just curious and haven't seen any mention in docs.

On 22 November 2013 12:33, David Pilato da...@pilato.fr wrote:

Whoa! 13Gb on a 16Gb machine is too much. 8Gb is better.
90 * 5 = 450 shards = 450 Lucene instances on a single machine and
JVM
with
only 13 or 8 Gb RAM!

I think I would first decrease the number of shards to 1 per index.
I would consider to add more nodes as well and/or more memory if
possible.

Auto closing index does not exist as opening an index comes with a
cost,
this is probably something you would like to control.
I would probably manage that on a client level.

You could also consider add less expensive nodes to hold your hold
data
as
you probably have less requests on them and use

Elasticsearch Platform — Find real-time answers at scale | Elastic
to move your old indices to those nodes.

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 09:07:41, Ivan Babrou (ibo...@gmail.com) a
écrit:

We have 2 machines with 16gb of ram and 13gb are given to jvm. We
have
90
indices (5 shards, 1 replica), one for a day of data, 500gb of data
on
each
node. 20% of memory is for field cache and 5% is for filter cache.

The problem is that we have to shrink cache size again because of
increased
memory usage over time. Cluster restart doesn't help. I guess that
indices
require some memory, but apparently there is no way to find out how
much
memory each shard is using that cannot be freed by GC. Now we have
10-20% of
cpu time wasted by GC and this is not what we'd like to see.

Is there a way to reduce or at least find out memory usage for
indices/shards? Ideally it would be cool if elasticsearch could
"park"
old
indices that are not used often, kind of automatically open/close
for
indices.

You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an
email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit

https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit

https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/1iVhVumjYjw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANWdNRD4opmiatpiY1KT-WF%3DamK8Psaq5f_hpYZeduoy0d5aBg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.