ElasticSearch Init Script w/ Java Change


(Matt Wise) #1

Last night we ran into an interesting issue. We pushed out a change to our
hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on all
of our hosts -- previously it had been the default only on a small subset
of our systems. When this happened, our ElasticSearch hosts broke in a
fairly spectacular way. The basic problem seems to be that changing out the
Java binary caused the /etc/init.d/elasticsearch init script to believe the
app was not running (though it was), and therefore Puppet started it up. It
looked like this:

puppet-agent[7069]:

(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update]) Triggered
'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'

I want to stress, ElasticSearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up ElasticSearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.

Is this a design feature? Bug? Thoughts?

Matt Wise
Sr. Systems Architect
Nextdoor.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

I'd say it was the java swap that caused it, as ES will not start another
process if it can see one running;

markw@es00-fv:~$ ps -ef|grep java
106 20801 1 5 Feb25 ? 1-14:27:46 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch
-Des.default.path.data=/var/lib/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
markw 24590 24487 0 08:18 pts/0 00:00:00 grep java
markw@es00-fv:~$ sservice elasticsearch status
[sudo] password for markw:

  • elasticsearch is running
    markw@es00-fv:~$ sservice elasticsearch start

  • Starting Elasticsearch Server

               * Already running.
    
    
                        [ OK ]
    

markw@es00-fv:~$ ps -ef|grep java
106 20801 1 5 Feb25 ? 1-14:27:48 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch
-Des.default.path.data=/var/lib/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
markw 24626 24487 0 08:18 pts/0 00:00:00 grep java
markw@es00-fv:~$

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 26 March 2014 03:51, Matt Wise matt@nextdoor.com wrote:

Last night we ran into an interesting issue. We pushed out a change to our
hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on all
of our hosts -- previously it had been the default only on a small subset
of our systems. When this happened, our ElasticSearch hosts broke in a
fairly spectacular way. The basic problem seems to be that changing out the
Java binary caused the /etc/init.d/elasticsearch init script to believe the
app was not running (though it was), and therefore Puppet started it up. It
looked like this:

puppet-agent[7069]:

(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update]) Triggered
'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'

I want to stress, ElasticSearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up ElasticSearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.

Is this a design feature? Bug? Thoughts?

Matt Wise
Sr. Systems Architect
Nextdoor.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZLzZa9H8_CbXE8brFmcy3wyy0WF%3DMann0QUy2okqcL1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #3

Hey,

actually the init scripts are checking the pid files, so wondering what has
happened here. Can you reliably reproduce it? If so, an issue would be
great.

--Alex

On Tue, Mar 25, 2014 at 10:19 PM, Mark Walkom markw@campaignmonitor.comwrote:

I'd say it was the java swap that caused it, as ES will not start another
process if it can see one running;

markw@es00-fv:~$ ps -ef|grep java
106 20801 1 5 Feb25 ? 1-14:27:46 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch
-Des.default.path.data=/var/lib/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
markw 24590 24487 0 08:18 pts/0 00:00:00 grep java
markw@es00-fv:~$ sservice elasticsearch status
[sudo] password for markw:

  • elasticsearch is running
    markw@es00-fv:~$ sservice elasticsearch start

  • Starting Elasticsearch Server

                 * Already running.
    
    
                          [ OK ]
    

markw@es00-fv:~$ ps -ef|grep java
106 20801 1 5 Feb25 ? 1-14:27:48 /usr/bin/java -Xms4g
-Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch
-Des.default.path.data=/var/lib/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
markw 24626 24487 0 08:18 pts/0 00:00:00 grep java
markw@es00-fv:~$

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 26 March 2014 03:51, Matt Wise matt@nextdoor.com wrote:

Last night we ran into an interesting issue. We pushed out a change to
our hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on
all of our hosts -- previously it had been the default only on a small
subset of our systems. When this happened, our ElasticSearch hosts broke in
a fairly spectacular way. The basic problem seems to be that changing out
the Java binary caused the /etc/init.d/elasticsearch init script to believe
the app was not running (though it was), and therefore Puppet started it
up. It looked like this:

puppet-agent[7069]:

(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update])
Triggered 'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'

I want to stress, ElasticSearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up ElasticSearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.

Is this a design feature? Bug? Thoughts?

Matt Wise
Sr. Systems Architect
Nextdoor.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZLzZa9H8_CbXE8brFmcy3wyy0WF%3DMann0QUy2okqcL1Q%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624ZLzZa9H8_CbXE8brFmcy3wyy0WF%3DMann0QUy2okqcL1Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_7Ty6qV0o5_-HoiUH7D80XLtPjshwwvHUynGhrU2geFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4