I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A lot
of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception caught
on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
Sadly, this exception is caused by another problem that happened on another
node. Which version are you running? If you can try with master (or wait a
couple days for 0.7) that would be great.
I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
Sadly, this exception is caused by another problem that happened on another
node. Which version are you running? If you can try with master (or wait a
couple days for 0.7) that would be great.
I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 => fts-test-4/
10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines. A
lot of data (about 40 documents (~200 kB) coming from one machine per sec).
About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 =>
fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
Just downloaded and I got a file named elasticsearch-elasticsearch-a0b25ec.
Its from here: http://www.elasticsearch.com/download/master/. Also, the
version is printed in the logs / console when elasticsearch starts up.
I've launched 4-machines ES cluster. 2 machines are data nodes, 2 other
machines are non-data nodes.
I've started putting data (via prepared script) from 5 other machines.
A lot of data (about 40 documents (~200 kB) coming from one machine per
sec). About couple of minutes I get log like this:
[11:17:25,431][WARN ][transport.netty ] [Cold War] Exception
caught on netty layer [[id: 0x03435ec9, /10.1.49.90:49601 =>
fts-test-4/10.1.49.89:9300]]
org.elasticsearch.transport.ResponseHandlerNotFoundTransportException:
Transport response handler not found of id [166613]
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:71)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:391)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:323)
at
org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:275)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:196)
at
org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
And this server crashes (because other node gets "Master" position).
Available memory is enough. What is the problem?
That's the name of downloaded file. Even now when I'm choosing downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of "git describe", which shows the most recent tag (which is 0.6.0 because 0.7.0 hasn't been released yet), followed by the number of commits since then, followed by the short hash that identifies the latest commit.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output of
"git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).
Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).
Last, I need that configuration, I think it has wrong configuration for the
gateway.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
I just fixed a major bug which caused the transport layer in elasticsearch
to stop working (against a specific node): Issues · elastic/elasticsearch · GitHub. This might
explain the reason things stopped working completely for you.
I still would like to look at your configuration regarding the gateway.
Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).
Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).
Last, I need that configuration, I think it has wrong configuration for the
gateway.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
There is another problem also in this situation. While putting data
simultanously from few machines (and about 20 threads putting data per
machine) ES after some time (5-15 minutes) behave weird: from to time server
is not responding immediately to requests, but has 5-60 seconds of
"no-responding" to any PUT request. Changing
threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.
I just fixed a major bug which caused the transport layer in elasticsearch
to stop working (against a specific node): Issues · elastic/elasticsearch · GitHub. This
might explain the reason things stopped working completely for you.
I still would like to look at your configuration regarding the gateway.
Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).
Another thing that I saw is that you create several indices. Can you check
if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).
Last, I need that configuration, I think it has wrong configuration for
the gateway.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the output
of "git describe", which shows the most recent tag (which is 0.6.0 because
0.7.0 hasn't been released yet), followed by the number of commits since
then, followed by the short hash that identifies the latest commit.
So, your configuration is not correct for the gateway. You should remove the
index gateway configuration part, since in this case, all indices will use
the same location. If you remove it, the indices will automatically use the
fs gateway since it is configured on the gateway itself, and each will have
its own location. With the configuration you have now, each index will
override the other one.
Regarding the pauses, I think that you are overloading elasticsearch too
much with all the indices you create. Can you monitor the garbage collection
on the JVM (using jconsole or visualvm)?
There is another problem also in this situation. While putting data
simultanously from few machines (and about 20 threads putting data per
machine) ES after some time (5-15 minutes) behave weird: from to time server
is not responding immediately to requests, but has 5-60 seconds of
"no-responding" to any PUT request. Changing
threadpool.cached.scheduled_size from 20 to 100 doesn't change anything.
I just fixed a major bug which caused the transport layer in elasticsearch
to stop working (against a specific node): Issues · elastic/elasticsearch · GitHub. This
might explain the reason things stopped working completely for you.
I still would like to look at your configuration regarding the gateway.
Also, can you try the latest master? I think I have fixed a problem where
the master was detected as failed without waiting for the complete timeout
for it (it basically pings with a timeout of 6 seconds for 5 times till it
is decided that it has failed).
Another thing that I saw is that you create several indices. Can you
check if the JVMs are starting to struggle when it comes to memory? Are you
familiar with Java, can you open a visualvm or jconsole against them to
check it (the data nodes are the interesting ones).
Last, I need that configuration, I think it has wrong configuration for
the gateway.
That's the name of downloaded file. Even now when I'm choosing
downloading master the name of the downloaded file contains "v0.6.0")
That should be OK. a string like "v0.6.0-132-g040030d" is the
output of "git describe", which shows the most recent tag (which is 0.6.0
because 0.7.0 hasn't been released yet), followed by the number of commits
since then, followed by the short hash that identifies the latest commit.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.