Cluster crash on query

Richo_Healey · December 16, 2012, 11:00am

We had a 0.19.11 cluster in production for a few weeks. One of our devs
rolled out some new code and the cluster bascially came down on itself
hard, first complaining of missing shards, and then beginning to ostensibly
work when I began manually reindexing data.

No migrations were included in the deploy, only some new queries (Which
unfortunately I don't have with me now, but I can add tomorrow)

The logs from the three nodes are at:

https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-29-24-75.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-64-38-196.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-85-75-6.log

Any insight would be welcomed. I've replaced the cluster with one running
0.20.1 for now but really have no guarantees the underlying issue is solved.

Richo

--

radu_gheorghe · December 17, 2012, 12:45pm

Hello Richo,

From the logs I think your nodes just got too busy and nodes couldn't see
each other because of that. So I think you need to either:

optimize the performance on what you already have (if that's possible)
add more nodes
use bigger nodes

I think that in order to provide more help, one would need some more
information, like:

how many nodes you have, and how big they are?
what's the ES configuration, especially around discovery?
how many indices and shards you have? how much data is in them? how does
the data look like (mapping)?
how do the new queries look like?

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Dec 16, 2012 at 1:00 PM, Richo Healey healey.rich@gmail.com wrote:

We had a 0.19.11 cluster in production for a few weeks. One of our devs
rolled out some new code and the cluster bascially came down on itself
hard, first complaining of missing shards, and then beginning to ostensibly
work when I began manually reindexing data.

No migrations were included in the deploy, only some new queries (Which
unfortunately I don't have with me now, but I can add tomorrow)

The logs from the three nodes are at:

https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-29-24-75.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-64-38-196.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-85-75-6.log

Any insight would be welcomed. I've replaced the cluster with one running
0.20.1 for now but really have no guarantees the underlying issue is solved.

Richo

--

--

Richo_Healey · December 18, 2012, 2:12am

Hi Radu,

I can certainly add more/larger nodes, but doing so seems like an issue,
from what I can see the cluster didn't fall over under load, it actually
seems to have lost track of it's shards.

I lit up an old cluster this morning, and totally idle sent it the query
that broke this one. It immediately went red and started throwing the same
errors.

The cluster in question is 3x m1.large's, using aws plugin for discovery,
fetching nodes in the Security Group (there are only 3)
5 shards, 2 replicas.
Only one index

I'll dump the mapping and the query in a moment, there are about 180k
records.

On Monday, 17 December 2012 23:45:50 UTC+11, Radu Gheorghe wrote:

Hello Richo,

From the logs I think your nodes just got too busy and nodes couldn't see
each other because of that. So I think you need to either:

optimize the performance on what you already have (if that's possible)

add more nodes

use bigger nodes

I think that in order to provide more help, one would need some more
information, like:

how many nodes you have, and how big they are?

what's the ES configuration, especially around discovery?

how many indices and shards you have? how much data is in them? how does
the data look like (mapping)?

how do the new queries look like?

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Dec 16, 2012 at 1:00 PM, Richo Healey <heale...@gmail.com<javascript:>

wrote:

We had a 0.19.11 cluster in production for a few weeks. One of our devs
rolled out some new code and the cluster bascially came down on itself
hard, first complaining of missing shards, and then beginning to ostensibly
work when I began manually reindexing data.

No migrations were included in the deploy, only some new queries (Which
unfortunately I don't have with me now, but I can add tomorrow)

The logs from the three nodes are at:

https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-29-24-75.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-64-38-196.log
https://s3.amazonaws.com/99designs-elasticsearch-logs/ip-10-85-75-6.log

Any insight would be welcomed. I've replaced the cluster with one running
0.20.1 for now but really have no guarantees the underlying issue is solved.

Richo

--

--

Topic		Replies	Views
Cluster crashed Elasticsearch	9	463	July 6, 2017
Disappearing Shards Elasticsearch	10	415	July 6, 2017
Cluster crash, symptoms and possible explanation Elasticsearch	20	2141	July 6, 2017
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	659	July 6, 2017
Disappearing Data and Unassigned Shards Elasticsearch	5	853	July 6, 2017

Cluster crash on query

Best regards, Radu

Best regards, Radu

Related topics

Best regards,
Radu

Best regards,
Radu