Warn which crashes server


(Szymon Gwóźdź) #1

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot
of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception caught
on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Shay Banon) #2

Sadly, this exception is caused by another problem that happened on another
node. Which version are you running? If you can try with master (or wait a
couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Szymon Gwóźdź) #3

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Sadly, this exception is caused by another problem that happened on another
node. Which version are you running? If you can try with master (or wait a
couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Shay Banon) #4

Master is on version 0.7 (which will be released in a couple of days), so
its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Sadly, this exception is caused by another problem that happened on another

node. Which version are you running? If you can try with master (or wait a
couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Szymon Gwóźdź) #5

That's the name of downloaded file. Even now when I'm choosing downloading
master the name of the downloaded file contains "v0.6.0")

W dniu 12 maja 2010 12:40 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Master is on version 0.7 (which will be released in a couple of days), so
its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Sadly, this exception is caused by another problem that happened on

another node. Which version are you running? If you can try with master (or
wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 =>
fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Shay Banon) #6

Just downloaded and I got a file named elasticsearch-elasticsearch-a0b25ec.
Its from here: http://www.elasticsearch.com/download/master/. Also, the
version is printed in the logs / console when elasticsearch starts up.

shay.banon

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

That's the name of downloaded file. Even now when I'm choosing downloading
master the name of the downloaded file contains "v0.6.0")

W dniu 12 maja 2010 12:40 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Master is on version 0.7 (which will be released in a couple of days), so

its strange you see 0.6 ...

cheers,
shay.banon

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I'm using master ( v0.6.0-132-g040030d )

cheers
Szymon Gwóźdź

W dniu 11 maja 2010 15:42 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

Sadly, this exception is caused by another problem that happened on

another node. Which version are you running? If you can try with master (or
wait a couple days for 0.7) that would be great.

cheers,
shay.banon

2010/5/11 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines.
A lot of data (about 40 documents (~200 kB) coming from one machine per
sec). About couple of minutes I get log like this:

[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 =>
fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?

cheers
Szymon Gwóźdź


(Bart Schuller) #7

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #8

ok, so lets verify it, when you start the server, do you see 0.7 snapshot
version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Szymon Gwóźdź) #9

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot
version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #10

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot

version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Szymon Gwóźdź) #11

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot

version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #12

Can you send the configuration you use as well? I think you have a
misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7 snapshot

version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #13

Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the
gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Can you send the configuration you use as well? I think you have a
misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7

snapshot version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #14

I just fixed a major bug which caused the transport layer in elasticsearch
to stop working (against a specific node):
http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This might
explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for the
gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Can you send the configuration you use as well? I think you have a

misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7

snapshot version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Szymon Gwóźdź) #15

Hi!

This is configuration I used:

cluster:
name: fts060

node:
data: true

http:
enabled: false

path:
work: /var/es-test/

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.49.90[9700], 10.1.49.91[9700],
10.1.49.88[9700], 10.1.49.89[9700]

gateway:
type: fs
fs:
location: /mnt/storage0/es-test

index:
gateway:
type: fs
fs:
location: /mnt/storage0/es-test

There is another problem also in this situation. While putting data
simultanously from few machines (and about 20 threads putting data per
machine) ES after some time (5-15 minutes) behave weird: from to time server
is not responding immediately to requests, but has 5-60 seconds of
"no-responding" to any PUT request. Changing
threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.

cheers,
Szymon Gwóźdź

W dniu 12 maja 2010 21:01 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

I just fixed a major bug which caused the transport layer in elasticsearch
to stop working (against a specific node):
http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This
might explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).

Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for
the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Can you send the configuration you use as well? I think you have a

misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7

snapshot version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(Shay Banon) #16

So, your configuration is not correct for the gateway. You should remove the
index gateway configuration part, since in this case, all indices will use
the same location. If you remove it, the indices will automatically use the
fs gateway since it is configured on the gateway itself, and each will have
its own location. With the configuration you have now, each index will
override the other one.

Regarding the pauses, I think that you are overloading elasticsearch too
much with all the indices you create. Can you monitor the garbage collection
on the JVM (using jconsole or visualvm)?

Shay

2010/5/13 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

This is configuration I used:

cluster:
name: fts060

node:
data: true

http:
enabled: false

path:
work: /var/es-test/

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.49.90[9700], 10.1.49.91[9700],
10.1.49.88[9700], 10.1.49.89[9700]

gateway:
type: fs
fs:
location: /mnt/storage0/es-test

index:
gateway:
type: fs
fs:
location: /mnt/storage0/es-test

There is another problem also in this situation. While putting data
simultanously from few machines (and about 20 threads putting data per
machine) ES after some time (5-15 minutes) behave weird: from to time server
is not responding immediately to requests, but has 5-60 seconds of
"no-responding" to any PUT request. Changing
threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.

cheers,
Szymon Gwóźdź

W dniu 12 maja 2010 21:01 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

I just fixed a major bug which caused the transport layer in elasticsearch

to stop working (against a specific node):
http://github.com/elasticsearch/elasticsearch/issues/#issue/170. This
might explain the reason things stopped working completely for you.

I still would like to look at your configuration regarding the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).

Another thing that I saw is that you create several indices. Can you
check if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).

Last, I need that configuration, I think it has wrong configuration for
the gateway.

cheers,
shay.banon

2010/5/12 Shay Banon shay.banon@elasticsearch.com

Can you send the configuration you use as well? I think you have a

misconfiguration in the gateway...

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

logs (from all 4 nodes) in attachment

W dniu 12 maja 2010 14:39 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, can you send me the logs so I will have a look?

2010/5/12 Szymon Gwóźdź sz.gwozdz@gmail.com

Yes, I used 0.7 snapshot version

W dniu 12 maja 2010 14:25 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

ok, so lets verify it, when you start the server, do you see 0.7

snapshot version?

2010/5/12 Bart Schuller Bart.Schuller@gmail.com

On May 12, 2010, at 13:10, Szymon Gwóźdź wrote:

That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")

That should be OK. a string like "v0.6.0-132-g040030d" is the
output of "git describe", which shows the most recent tag (which is 0.6.0
because 0.7.0 hasn't been released yet), followed by the number of commits
since then, followed by the short hash that identifies the latest commit.

--
Bart.Schuller@gmail.com


(system) #17