Master node keeps crashing


(Gaurav Arora) #1

I'm having a very odd problem with one of my elasticsearch clusters. The
master node on the cluster crashes randomly. The cluster is running on 3
different ec2 instances.

My cluster configuration is:
3 nodes (1 master, all data nodes)
600 GB of data (3k IOPS EBS volumes)
700 million documents

There are no hprof files anywhere so I don't think this was a memory
problem. I have set the log level to DEBUG but there is no information in
them at all.

I'm not sure where to start debugging this problem. I realise my post is
not very helpful but if I could get pointers on where to start I will
produce logs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

How much RAM per node, what java flavour and version, what ES version?

Are the logs showing any OOM?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 17:16, Gaurav Arora gauravsworld@gmail.com wrote:

I'm having a very odd problem with one of my elasticsearch clusters. The
master node on the cluster crashes randomly. The cluster is running on 3
different ec2 instances.

My cluster configuration is:
3 nodes (1 master, all data nodes)
600 GB of data (3k IOPS EBS volumes)
700 million documents

There are no hprof files anywhere so I don't think this was a memory
problem. I have set the log level to DEBUG but there is no information in
them at all.

I'm not sure where to start debugging this problem. I realise my post is
not very helpful but if I could get pointers on where to start I will
produce logs.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aVupm0sUJTi7u-z7%2Bw1tNA2ogy0iE6PoQ%3DRQFx7Xfy%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Gaurav Arora) #3

I am using the latest openjdk version 7 installed from ubuntu repos.

ubuntu@es1:~$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.6) (7u51-2.4.6-1ubuntu4)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

ES is set to run with -Xms14075m -Xmx14075m with bootstrap.mlockall set to
true.

The instance is a r3.large instance.

There is no OOM in the logs anywhere, the last line of the log shows that
an index was created about 15-20 minutes before the crash. Thats it.

On Tuesday, June 10, 2014 12:53:41 PM UTC+5:30, Mark Walkom wrote:

How much RAM per node, what java flavour and version, what ES version?

Are the logs showing any OOM?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 10 June 2014 17:16, Gaurav Arora <gaurav...@gmail.com <javascript:>>
wrote:

I'm having a very odd problem with one of my elasticsearch clusters. The
master node on the cluster crashes randomly. The cluster is running on 3
different ec2 instances.

My cluster configuration is:
3 nodes (1 master, all data nodes)
600 GB of data (3k IOPS EBS volumes)
700 million documents

There are no hprof files anywhere so I don't think this was a memory
problem. I have set the log level to DEBUG but there is no information in
them at all.

I'm not sure where to start debugging this problem. I realise my post is
not very helpful but if I could get pointers on where to start I will
produce logs.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541fc112-fafd-49c7-9a89-bac70fac298d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(sirkubax) #4

curl -XGET 'http://localhost:9200/_nodes/_all/process?pretty=true' |less

Have you been considering "max_file_descriptors"?

W dniu wtorek, 10 czerwca 2014 09:36:35 UTC+2 użytkownik Gaurav Arora
napisał:

I am using the latest openjdk version 7 installed from ubuntu repos.

ubuntu@es1:~$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.6) (7u51-2.4.6-1ubuntu4)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

ES is set to run with -Xms14075m -Xmx14075m with bootstrap.mlockall set to
true.

The instance is a r3.large instance.

There is no OOM in the logs anywhere, the last line of the log shows that
an index was created about 15-20 minutes before the crash. Thats it.

On Tuesday, June 10, 2014 12:53:41 PM UTC+5:30, Mark Walkom wrote:

How much RAM per node, what java flavour and version, what ES version?

Are the logs showing any OOM?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 17:16, Gaurav Arora gaurav...@gmail.com wrote:

I'm having a very odd problem with one of my elasticsearch clusters. The
master node on the cluster crashes randomly. The cluster is running on 3
different ec2 instances.

My cluster configuration is:
3 nodes (1 master, all data nodes)
600 GB of data (3k IOPS EBS volumes)
700 million documents

There are no hprof files anywhere so I don't think this was a memory
problem. I have set the log level to DEBUG but there is no information in
them at all.

I'm not sure where to start debugging this problem. I realise my post is
not very helpful but if I could get pointers on where to start I will
produce logs.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3752511-ebfd-465b-b55b-3e2a28650350%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Gaurav Arora) #5

The max file descriptors are all set to 64k.

This is the output from one of the slave nodes -
http://pastebin.com/RdmZsJbH

On Tue, Jun 10, 2014 at 2:50 PM, sirkubax jakubxmuszynski@googlemail.com
wrote:

curl -XGET 'http://localhost:9200/_nodes/_all/process?pretty=true' |less

Have you been considering "max_file_descriptors"?

W dniu wtorek, 10 czerwca 2014 09:36:35 UTC+2 użytkownik Gaurav Arora
napisał:

I am using the latest openjdk version 7 installed from ubuntu repos.

ubuntu@es1:~$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.6) (7u51-2.4.6-1ubuntu4)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

ES is set to run with -Xms14075m -Xmx14075m with bootstrap.mlockall set
to true.

The instance is a r3.large instance.

There is no OOM in the logs anywhere, the last line of the log shows that
an index was created about 15-20 minutes before the crash. Thats it.

On Tuesday, June 10, 2014 12:53:41 PM UTC+5:30, Mark Walkom wrote:

How much RAM per node, what java flavour and version, what ES version?

Are the logs showing any OOM?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 17:16, Gaurav Arora gaurav...@gmail.com wrote:

I'm having a very odd problem with one of my elasticsearch clusters.
The master node on the cluster crashes randomly. The cluster is running on
3 different ec2 instances.

My cluster configuration is:
3 nodes (1 master, all data nodes)
600 GB of data (3k IOPS EBS volumes)
700 million documents

There are no hprof files anywhere so I don't think this was a memory
problem. I have set the log level to DEBUG but there is no information in
them at all.

I'm not sure where to start debugging this problem. I realise my post
is not very helpful but if I could get pointers on where to start I will
produce logs.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/36f06a1c-1c5d-4468-87c8-920f56b30efd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/J4_sNWls2dA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a3752511-ebfd-465b-b55b-3e2a28650350%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a3752511-ebfd-465b-b55b-3e2a28650350%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQ9OU1LPPVeEWCQ8gGgnXQ_QkbdqUmMkEG70UqFqOx_MtXvvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6