Elasticsearch rpm and configuring garbage collection


(Jilles van Gurp) #1

I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and
I've been wondering about the recommended way to configure it given that it
deploys an init.d script with defaults.

I figured out that I can use /etc/sysconfig/elasticsearch for things like
heap size. However, /usr/share/elasticsearch/bin/elasticsearch.in.sh
configures some defaults for garbage collection:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

So, I'm getting some default configuration for garbage collection that I
probably should be tuning; especially given that it is running out of
memory after a few weeks on our setup with kibana and a rather large amount
of logstash indices (over 200GB).

Is it possible to have a custom garbage collection strategy without
modifying files deployed and overwritten by the rpm? elasticsearch.in.sh
seems specific to the 1.1.1 version given that it also includes the
classpath definition.

In any case, it might be handy to clarify the recommended way to configure
elasticsearch when deployed using the rpm as opposed to a developer machine
with a tar ball. Most documentation I'm finding seems to assume the latter.

Jilles

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #2

Hi Jilles,

Any idea on why you're running out of memory? You can monitor stuff like
field, filter caches and memory pools to get some clues.

I would assume your problem is because field data is accumulating, and not
because of GC settings. Depending on how much heap, how many nodes you
have, and how much heap is used for other things, I'd limit that a slice of
the total memory (for example, 30%).

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Apr 25, 2014 at 7:09 PM, Jilles van Gurp jillesvangurp@gmail.comwrote:

I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and
I've been wondering about the recommended way to configure it given that it
deploys an init.d script with defaults.

I figured out that I can use /etc/sysconfig/elasticsearch for things like
heap size. However, /usr/share/elasticsearch/bin/elasticsearch.in.shconfigures some defaults for garbage collection:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

So, I'm getting some default configuration for garbage collection that I
probably should be tuning; especially given that it is running out of
memory after a few weeks on our setup with kibana and a rather large amount
of logstash indices (over 200GB).

Is it possible to have a custom garbage collection strategy without
modifying files deployed and overwritten by the rpm? elasticsearch.in.shseems specific to the 1.1.1 version given that it also includes the
classpath definition.

In any case, it might be handy to clarify the recommended way to configure
elasticsearch when deployed using the rpm as opposed to a developer machine
with a tar ball. Most documentation I'm finding seems to assume the latter.

Jilles

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_0h10irKcsRU%3DPDu5DrVPACnSq01sZestFYrwftqze%3DAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jilles van Gurp) #3

Sounds like that could be the cause. What setting would I need to configure
for this? Regardless, I'd like to know where to start with configuring
garbage collection for ES.

Jilles

On Monday, April 28, 2014 8:58:03 AM UTC+2, Radu Gheorghe wrote:

Hi Jilles,

Any idea on why you're running out of memory? You can monitor stuff like
field, filter caches and memory pools to get some clues.

I would assume your problem is because field data is accumulating, and not
because of GC settings. Depending on how much heap, how many nodes you
have, and how much heap is used for other things, I'd limit that a slice of
the total memory (for example, 30%).

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Apr 25, 2014 at 7:09 PM, Jilles van Gurp <jilles...@gmail.com<javascript:>

wrote:

I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup
and I've been wondering about the recommended way to configure it given
that it deploys an init.d script with defaults.

I figured out that I can use /etc/sysconfig/elasticsearch for things like
heap size. However, /usr/share/elasticsearch/bin/elasticsearch.in.shconfigures some defaults for garbage collection:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

So, I'm getting some default configuration for garbage collection that I
probably should be tuning; especially given that it is running out of
memory after a few weeks on our setup with kibana and a rather large amount
of logstash indices (over 200GB).

Is it possible to have a custom garbage collection strategy without
modifying files deployed and overwritten by the rpm? elasticsearch.in.shseems specific to the 1.1.1 version given that it also includes the
classpath definition.

In any case, it might be handy to clarify the recommended way to
configure elasticsearch when deployed using the rpm as opposed to a
developer machine with a tar ball. Most documentation I'm finding seems to
assume the latter.

Jilles

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8cb48df2-779f-4954-bd12-8677be19075a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #4

Hi,

The setting is indices.fielddata.cache.size. You can check out the docs for
more options, like adjusting the circuit breaker:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

To change the GC settings, I usually edit the in.sh script. You have an
interesting point, that it might be overridden by a RPM upgrade. I'm not
aware of a way to override them, maybe somebody else is.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Mon, Apr 28, 2014 at 10:22 AM, Jilles van Gurp
jillesvangurp@gmail.comwrote:

Sounds like that could be the cause. What setting would I need to
configure for this? Regardless, I'd like to know where to start with
configuring garbage collection for ES.

Jilles

On Monday, April 28, 2014 8:58:03 AM UTC+2, Radu Gheorghe wrote:

Hi Jilles,

Any idea on why you're running out of memory? You can monitor stuff like
field, filter caches and memory pools to get some clues.

I would assume your problem is because field data is accumulating, and
not because of GC settings. Depending on how much heap, how many nodes you
have, and how much heap is used for other things, I'd limit that a slice of
the total memory (for example, 30%).

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, Apr 25, 2014 at 7:09 PM, Jilles van Gurp jilles...@gmail.comwrote:

I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup
and I've been wondering about the recommended way to configure it given
that it deploys an init.d script with defaults.

I figured out that I can use /etc/sysconfig/elasticsearch for things
like heap size. However, /usr/share/elasticsearch/bin/elasticsearc
h.in.sh configures some defaults for garbage collection:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

So, I'm getting some default configuration for garbage collection that I
probably should be tuning; especially given that it is running out of
memory after a few weeks on our setup with kibana and a rather large amount
of logstash indices (over 200GB).

Is it possible to have a custom garbage collection strategy without
modifying files deployed and overwritten by the rpm? elasticsearch.in.shseems specific to the 1.1.1 version given that it also includes the
classpath definition.

In any case, it might be handy to clarify the recommended way to
configure elasticsearch when deployed using the rpm as opposed to a
developer machine with a tar ball. Most documentation I'm finding seems to
assume the latter.

Jilles

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/182eb657-9503-42f6-8007-41150143fe46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8cb48df2-779f-4954-bd12-8677be19075a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8cb48df2-779f-4954-bd12-8677be19075a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_3ex3uw5A2o6XCHbf75KirG4yGoHawbt2Com05G1Cfj4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael Salmon) #5

On Friday, 25 April 2014 18:09:28 UTC+2, Jilles van Gurp wrote:

I've been using the elasticsearch rpms (1.1.1) on our centos 6.5 setup and
I've been wondering about the recommended way to configure it given that it
deploys an init.d script with defaults.

I figured out that I can use /etc/sysconfig/elasticsearch for things like
heap size. However, /usr/share/elasticsearch/bin/elasticsearch.in.shconfigures some defaults for garbage collection:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

So, I'm getting some default configuration for garbage collection that I
probably should be tuning; especially given that it is running out of
memory after a few weeks on our setup with kibana and a rather large amount
of logstash indices (over 200GB).

Is it possible to have a custom garbage collection strategy without
modifying files deployed and overwritten by the rpm? elasticsearch.in.shseems specific to the 1.1.1 version given that it also includes the
classpath definition.

In any case, it might be handy to clarify the recommended way to configure
elasticsearch when deployed using the rpm as opposed to a developer machine
with a tar ball. Most documentation I'm finding seems to assume the latter.

Jilles

The way I handle configuration with rpms is to create a special directory
with config files for each cluster and then start the cluster pointing out
the config file to be used with -Des.config. In that config file I point
out the directory to be used for config with path.conf so that logging.yml
can be found. I also use a special start script so that I can run ulimit
and set a few parameters before starting es.

/Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e4cf550-5d23-4a16-8ae3-d75489efca36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6