Last night we ran into an interesting issue. We pushed out a change to our
hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on all
of our hosts -- previously it had been the default only on a small subset
of our systems. When this happened, our Elasticsearch hosts broke in a
fairly spectacular way. The basic problem seems to be that changing out the
Java binary caused the /etc/init.d/elasticsearch init script to believe the
app was not running (though it was), and therefore Puppet started it up. It
looked like this:
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update]) Triggered
'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present' puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'
I want to stress, Elasticsearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up Elasticsearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.
Last night we ran into an interesting issue. We pushed out a change to our
hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on all
of our hosts -- previously it had been the default only on a small subset
of our systems. When this happened, our Elasticsearch hosts broke in a
fairly spectacular way. The basic problem seems to be that changing out the
Java binary caused the /etc/init.d/elasticsearch init script to believe the
app was not running (though it was), and therefore Puppet started it up. It
looked like this:
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update]) Triggered
'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present' puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'
I want to stress, Elasticsearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up Elasticsearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.
actually the init scripts are checking the pid files, so wondering what has
happened here. Can you reliably reproduce it? If so, an issue would be
great.
Last night we ran into an interesting issue. We pushed out a change to
our hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on
all of our hosts -- previously it had been the default only on a small
subset of our systems. When this happened, our Elasticsearch hosts broke in
a fairly spectacular way. The basic problem seems to be that changing out
the Java binary caused the /etc/init.d/elasticsearch init script to believe
the app was not running (though it was), and therefore Puppet started it
up. It looked like this:
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed
successfullypuppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886
from Apt::Source
oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure)
createdpuppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from
Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns)
executed successfully
puppet-agent[7069]:
(/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created
puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update])
Triggered 'refresh' from 2 events
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure
changed 'purged' to 'present'
puppet-agent[7069]:
(/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure
changed 'purged' to 'present' puppet-agent[7069]:
(/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure
changed 'stopped' to 'running'
I want to stress, Elasticsearch was already running ... but the Java
change seems to have tripped up the init script so that its 'status'
command returned a >0 exit code, causing Puppet to think it needed to start
up Elasticsearch. When this happened, we ended up running two ES daemons on
each of our nodes, and a whole ton of "reshuffling" occurred.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.