AWS, Ubuntu, and Maverick

kimchy · January 30, 2011, 8:07pm

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

kimchy · January 30, 2011, 8:31pm

One more thing I forgot to mention (low level): when turning on GC logging, one would see something like this:

3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 37.8958700 secs] 689035K->593566K(8375552K), 37.8959970 secs] [Times: user=0.00 sys=0.00, real=37.90 secs]

As you can see, the collection was not doing any user/sys time, which means basically that it was not doing any work. My hunch is some kind of locking problem / bug.
On Sunday, January 30, 2011 at 10:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

Brad_Birnbaum · January 30, 2011, 10:45pm

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

Grant_Rodgers · January 30, 2011, 11:59pm

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

Brad_Birnbaum · January 31, 2011, 12:00am

We were using several different ones and all had the same issues but we are currently on u23

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 6:59 PM, Grant Rodgers wrote:

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

eonnen · January 31, 2011, 10:46pm

Not a JVM problem, it's lower level, kernel or libc, difficult to tell
due to the nature of the problem. Here's the Canonical bug:

and some additional data about how others have seen it manifested:

http://www.mail-archive.com/user@cassandra.apache.org/msg08696.html

On Sun, Jan 30, 2011 at 4:00 PM, Brad Birnbaum brad@assistly.com wrote:

We were using several different ones and all had the same issues but we are currently on u23

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 6:59 PM, Grant Rodgers wrote:

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

Brad_Birnbaum · January 31, 2011, 10:52pm

Thanks. I read about the SimpleGeo issue as it was happening but wasn't sure if it was related to what we were experiencing.

On Jan 31, 2011, at 5:46 PM, Erik Onnen eonnen@gmail.com wrote:

Not a JVM problem, it's lower level, kernel or libc, difficult to tell
due to the nature of the problem. Here's the Canonical bug:

Bug #708920 “Strange 'fork/clone' blocking behavior under high c...” : Bugs : linux-ec2 package : Ubuntu

and some additional data about how others have seen it manifested:

Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

On Sun, Jan 30, 2011 at 4:00 PM, Brad Birnbaum brad@assistly.com wrote:

We were using several different ones and all had the same issues but we are currently on u23

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 6:59 PM, Grant Rodgers wrote:

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

colinsurprenant · February 2, 2011, 12:16am

Thanks for the heads up on this.

I also experienced this problem on 10.04 m1.large ec2 instances. The
machines, only running ES 0.14.2, no swap, 50% free memory would lock
for a few minutes with a load going up to 20+ and obviously get kicked
out of the cluster at a rate of a few times per day.

I am planning on upgrading my cluster instances to 10.10 tomorrow. I
already tested the following ec2 10.04 -> 10.10 upgrade procedure:

sudo sed -i.bak -e 's@lts$@normal@' /etc/update-manager/release-upgrades
sudo aptitude update
sudo aptitude dist-upgrade
sudo poweroff

at this point the instance is upgraded but we need to attach the 10.10
kernel to the instance. Find the right kernel id, the latest 10.10
2.6.35-25-virtual for X86_64 is aki-427d952b. Using the ec2 api tools
do:

ec2-modify-instance-attribute i-xxx --kernel aki-427d952b

where i-xxx is your instance id

start your instance, enjoy 10.10.

Colin

On Mon, Jan 31, 2011 at 5:52 PM, Brad Birnbaum brad@assistly.com wrote:

Thanks. I read about the SimpleGeo issue as it was happening but wasn't sure if it was related to what we were experiencing.

On Jan 31, 2011, at 5:46 PM, Erik Onnen eonnen@gmail.com wrote:

Not a JVM problem, it's lower level, kernel or libc, difficult to tell
due to the nature of the problem. Here's the Canonical bug:

Bug #708920 “Strange 'fork/clone' blocking behavior under high c...” : Bugs : linux-ec2 package : Ubuntu

and some additional data about how others have seen it manifested:

Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

On Sun, Jan 30, 2011 at 4:00 PM, Brad Birnbaum brad@assistly.com wrote:

We were using several different ones and all had the same issues but we are currently on u23

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 6:59 PM, Grant Rodgers wrote:

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

colinsurprenant · February 2, 2011, 5:27pm

I forget to include the actual upgrade command in my sequence It should be:

sudo sed -i.bak -e 's@lts$@normal@' /etc/update-manager/release-upgrades
sudo aptitude update
sudo aptitude dist-upgrade
do-release-upgrade
#about 15 minutes of updates, with a few y/n prompts
sudo poweroff
#attach 10.10 kernel
#start your instance, enjoy 10.10

Anyway, my rolling cluster upgrade from 10.04 to 10.10 is in progress,
no problem so far.

Colin

On Tue, Feb 1, 2011 at 7:16 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Thanks for the heads up on this.

I also experienced this problem on 10.04 m1.large ec2 instances. The
machines, only running ES 0.14.2, no swap, 50% free memory would lock
for a few minutes with a load going up to 20+ and obviously get kicked
out of the cluster at a rate of a few times per day.

I am planning on upgrading my cluster instances to 10.10 tomorrow. I
already tested the following ec2 10.04 -> 10.10 upgrade procedure:

sudo sed -i.bak -e 's@lts$@normal@' /etc/update-manager/release-upgrades
sudo aptitude update
sudo aptitude dist-upgrade
sudo poweroff

at this point the instance is upgraded but we need to attach the 10.10
kernel to the instance. Find the right kernel id, the latest 10.10
2.6.35-25-virtual for X86_64 is aki-427d952b. Using the ec2 api tools
do:

ec2-modify-instance-attribute i-xxx --kernel aki-427d952b

where i-xxx is your instance id

start your instance, enjoy 10.10.

Colin

On Mon, Jan 31, 2011 at 5:52 PM, Brad Birnbaum brad@assistly.com wrote:

Thanks. I read about the SimpleGeo issue as it was happening but wasn't sure if it was related to what we were experiencing.

On Jan 31, 2011, at 5:46 PM, Erik Onnen eonnen@gmail.com wrote:

Not a JVM problem, it's lower level, kernel or libc, difficult to tell
due to the nature of the problem. Here's the Canonical bug:

Bug #708920 “Strange 'fork/clone' blocking behavior under high c...” : Bugs : linux-ec2 package : Ubuntu

and some additional data about how others have seen it manifested:

Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

On Sun, Jan 30, 2011 at 4:00 PM, Brad Birnbaum brad@assistly.com wrote:

We were using several different ones and all had the same issues but we are currently on u23

Brad Birnbaum
CTO, Assistly
brad@assistly.com m: 516-308-2286

On Jan 30, 2011, at 6:59 PM, Grant Rodgers wrote:

Which jdk and version were you using and are you using the same one
now?

Very interesting information, thanks!

On Jan 30, 2:45 pm, Brad Birnbaum b...@assistly.com wrote:

So Shay is referring to Assistly. We were seeing some really bizarre things on certain Ubuntu 10.04 machines on AWS and after working closely with Shay and another large scale ES user we decided to try 10.10. It solved all of our GC lockup problems.

I wish I could say we knew why 10.10 made all the difference but we currently don't. I am happy to report that since we switch our ES boxes to 10.10 we have seen outstanding ES performance with zero GC lockup issues for over 2 weeks.

I will second Shay's comment - if you are running ES on Ubuntu 10.04 on AWS you should upgrade to 10.10.

Brad Birnbaum
CTO, Assistly
b...@assistly.com m: 516-308-2286

On Jan 30, 2011, at 3:07 PM, Shay Banon wrote:

Hi,

Wanted to share with ES users a nice little story about using ES on AWS, with a very important note at the end for anybody running ES on AWS:

An ES user (a really cool company, which are doing great stuff with ES, if the company is listening, I think it would be great if you want to share the uses of ES ) was having problems with ES when doing load testing.

Every once in a while, ES would freeze up for a few minutes, and then "release". This usually can happen for several reasons, but looking at the logs we saw log messages similar of ES logging a GC (Garbage Collector) that took several minutes on ParNew.

Having several minutes GC collection on the ParNew generation of the JVM is very dubious. This can happen for several reasons, the ES process being swapped (which is really bad JVM based processes), or a possible bug in the JVM (there are several listed as fixed through the JVM different versions).

I would not go through all the things we tried in order to fix it, but the problem ended up being Ubuntu 10.04. Once we upgraded to Ubuntu 10.10, the problem was solved.

I don't know what change in Ubuntu fixed the problem. What I do know is that people were running 10.04 for a long time with no problem on AWS, which leads me to suspect that AWS changed something (like changing their Xen version) that triggered this problem. Hopefully, now with them providing the Elastic Beanstalk, they will start testing more properly the implications internal upgrades they do on the JVM.

Funnily enough, I started to hear it from several users around this time, and with all of them, upgrading to Ubuntu 10.10 solved the problem.

So, if you are running Ubuntu 10.04 on AWS, make sure you upgrade to 10.10.

-shay.banon

Topic		Replies	Views
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3536	March 3, 2018
ES issues and debugging help Elasticsearch	5	368	July 6, 2017
Cluster locks up Elasticsearch	9	1623	July 6, 2017
ES 2.3.x: jvm.monitor log shows GC taking > 1min, cluster breaks Elasticsearch	2	963	July 5, 2017
ES gone into a hung state, our production down. please help! Elasticsearch	4	994	July 6, 2017

AWS, Ubuntu, and Maverick

Related topics