River not being migrated to new node on service restart (ES 0.15.0)


(charles@dyfis.net) #1

Howdy! I created an AMQP river to feed in log events. It successfully
establishes a connection, and all works well... until I restart my
node.

When that happens, the river stays attached to the old node name,
rather than being started up anew. Thus, I end in a situation like the
following:

$ curl http://127.0.0.1:9200/_river/logstash_events/_status?pretty=true
{
"_index" : "_river",
"_type" : "logstash_events",
"_id" : "_status",
"_version" : 5, "_source" : {"ok":true,"node":
{"id":"mN0roKFeTOWNUVpq4mQZqg","name":"Captain
Fate","transport_address":"inet[/10.0.2.15:9300]"}}
}
$ curl http://127.0.0.1:9200/_cluster/state?pretty=true
...
"nodes" : {
"tklLqDDSRseHvn9Eq2V3Zw" : {
"name" : "Potts, Virginia "Pepper"",
"transport_address" : "inet[/10.0.2.15:9300]",
"attributes" : {
}
}
...

...notably, the river is still associated with Captain Fate, despite
Captain Fate having passed away in favor of Virginia "Pepper" Pots.

I can get things unstuck by deleting and recreating the river whenever
the condition occurs, but that's a fair sight less than graceful. Is
there a better approach?

Thanks!


(charles@dyfis.net) #2

Actually -- this looks to be trickier than I previously expected:
DELETE'ing a river (even restarting after doing so) is not always
sufficient to make it re-add'able under the same name, though shifting
to a different name works.

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java
$ curl -XDELETE 'localhost:9200/_river/logstash_events_201102220651/
_meta'
{"ok":true,"found":true,"_index":"_river","_type":"logstash_events_201102220651","_id":"_meta","_version":
2}

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java

sv t /service/elasticsearch ## restart the service

netstat --tcp -ep | grep java | grep amqp ## the amqp connection is

gone now
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220700/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}
}'

netstat --tcp -ep | grep java | grep amqp ## ...but hey, it's still

gone...
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220702/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}
}'

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:56203
mon1.vguest:amqp ESTABLISHED elasticsearch 148361 21250/
java

...until we created it under a new name, at which time it works fine.


(charles@dyfis.net) #3

Downgrading to 0.14.x, behavior is unchanged -- with the exception
that I get an error message during startup which at least provides an
appearance of making some sense of the situation:

2011-02-22_13:59:49.62013 [13:59:49,619][WARN ]
[river.routing ] [Ultra-Marine] failed to get/parse _meta
for [logstash_events_201102220700]
org.elasticsearch.action.NoShardAvailableActionException: [_river]3]
No shard available for [logstash_events_201102220700#_meta]

...frankly, while I'm still not able to have a river persist through
cluster restart, getting at least an error message is good for
morale. :slight_smile:

On Feb 22, 7:08 am, Charles Duffy char...@dyfis.net wrote:

Actually -- this looks to be trickier than I previously expected:
DELETE'ing a river (even restarting after doing so) is not always
sufficient to make it re-add'able under the same name, though shifting
to a different name works.

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java
$ curl -XDELETE 'localhost:9200/_river/logstash_events_201102220651/
_meta'
{"ok":true,"found":true,"_index":"_river","_type":"logstash_events_20110222 0651","_id":"_meta","_version":
2}

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java

sv t /service/elasticsearch ## restart the service

netstat --tcp -ep | grep java | grep amqp ## the amqp connection is

gone now
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220700/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}}'

netstat --tcp -ep | grep java | grep amqp ## ...but hey, it's still

gone...
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220702/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}}'

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:56203
mon1.vguest:amqp ESTABLISHED elasticsearch 148361 21250/
java

...until we created it under a new name, at which time it works fine.


(Shay Banon) #4

Heya,

Found the problem, it relates to restoring rivers when there in a single node cluster. Fixed it: https://github.com/elasticsearch/elasticsearch/issues/711. You can work around it by "forcing" a cluster change by either adding another node or creating a dummy index (and deleting it afterwards). Hackish, I know, but this fix will be part of 0.15.1.

-shay.banon
On Tuesday, February 22, 2011 at 4:07 PM, Charles Duffy wrote:

Downgrading to 0.14.x, behavior is unchanged -- with the exception
that I get an error message during startup which at least provides an
appearance of making some sense of the situation:

2011-02-22_13:59:49.62013 [13:59:49,619][WARN ]
[river.routing ] [Ultra-Marine] failed to get/parse _meta
for [logstash_events_201102220700]
org.elasticsearch.action.NoShardAvailableActionException: [_river]3]
No shard available for [logstash_events_201102220700#_meta]

...frankly, while I'm still not able to have a river persist through
cluster restart, getting at least an error message is good for
morale. :slight_smile:

On Feb 22, 7:08 am, Charles Duffy char...@dyfis.net wrote:

Actually -- this looks to be trickier than I previously expected:
DELETE'ing a river (even restarting after doing so) is not always
sufficient to make it re-add'able under the same name, though shifting
to a different name works.

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java
$ curl -XDELETE 'localhost:9200/_river/logstash_events_201102220651/
_meta'
{"ok":true,"found":true,"_index":"_river","_type":"logstash_events_20110222 0651","_id":"_meta","_version":
2}

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:41506
mon1.vguest:amqp ESTABLISHED elasticsearch 146425 20600/
java

sv t /service/elasticsearch ## restart the service

netstat --tcp -ep | grep java | grep amqp ## the amqp connection is

gone now
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220700/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}}'

netstat --tcp -ep | grep java | grep amqp ## ...but hey, it's still

gone...
$ curl -v -XPUT 'localhost:9200/_river/logstash_events_201102220702/
_meta' -d '
{
"type" : "rabbitmq",
"rabbitmq" : {
"host" : "192.168.123.8",
"port" : 5672,
"user" : "logstash",
"pass" : "logstash",
"vhost" : "logstash",
"exchange" : "parsed_logs",
"exchange_type" : "fanout",
"exchange_durable": "true",
"queue" : "elasticsearch",
"queue_durable": "true",
"routing_key" : "elasticsearch"
},
"index" : {
"bulk_size" : 100,
"bulk_timeout" : "10ms"
}}'

netstat --tcp -ep | grep java | grep amqp

tcp 0 0 logs.vguest:56203
mon1.vguest:amqp ESTABLISHED elasticsearch 148361 21250/
java

...until we created it under a new name, at which time it works fine.


(system) #5