Errors: MasterNotDiscoveredException, IndexMissingException

Hi,

Documents stored previously (worked smoothly for last few days)
Json document is of following format:
"user": "123",
"file_name": "gftgf",
"file_extension": "xls",
"path":"temp"
Installation is on Ubuntu with service wrapper.

Now today, I executed queries:
test@Aditya ~
$ curl -XPOST 'http://localhost:9200/storage/fileinformation/_search?pretty=true' -
d '
{
"query": { "match_all": {} }
}'
{
"error" : "IndexMissingException[[fbdata] missing]",
"status" : 404
}

$ curl -XPOST 'http://localhost:9200/storage/fileinformation/' -d '
{
"user": "123",
"file_name": "gftgf",
"file_extension": "xls",
"path":"temp"
}'

{"error":"MasterNotDiscoveredException[]","status":500}
test@Aditya ~
$

When I checked for data folder, it is present with storage sub folder and index in it.

Cluster Settings

cluster:
name: datastore

Gateway Settings

gateway:
recover_after_nodes: 1
recover_after_time: 5m
expected_nodes: 2

Please note, although it mentions expected_nodes as 2, in reality for test setup I have only one.

Q1: As I can see files present, please advise, how do I get this back in action/make them usable with current instance.

Q2: I will also appreciate, any advise on what should be alternatives for avoiding this problem in real deployment senarios.

Best Regards,
Aditya

Small typo in settings, cluster name is data and not datastore (Got that copied while writing here from local setup where I am trying to reproduce problem with another machine :frowning: )
Apologies!

-a

Are you using just a single node for this and get the mentioned exceptions? Can you gist the cluster state in this case?
On Monday, March 21, 2011 at 2:43 PM, aditya.kulkarni wrote:

Small typo in settings, cluster name is data and not datastore (Got that
copied while writing here from local setup where I am trying to reproduce
problem with another machine :frowning: )
Apologies!

-a


Best Regards,
a

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Errors-MasterNotDiscoveredException-IndexMissingException-tp2709737p2709747.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hey Shay,

It is not cluster, it is single machine. As mentioned, it is test setup.

Best Regards,
Aditya

Sure, can you gist the cluster state response?
On Monday, March 21, 2011 at 4:05 PM, aditya.kulkarni wrote:

Hey Shay,

It is not cluster, it is single machine. As mentioned, it is test setup.

Best Regards,
Aditya


Best Regards,
a

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Errors-MasterNotDiscoveredException-IndexMissingException-tp2709737p2710063.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Here it is:

$ curl -XGET 'http://localhost:9200/_cluster/state'
{"error":"MasterNotDiscoveredException[]","status":500}

CAn you gist the log of that node? You might need to restart it.
On Monday, March 21, 2011 at 4:41 PM, aditya.kulkarni wrote:

Here it is:

$ curl -XGET 'http://localhost:9200/_cluster/state'
{"error":"MasterNotDiscoveredException","status":500}


Best Regards,
a

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Errors-MasterNotDiscoveredException-IndexMissingException-tp2709737p2710187.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hey Shay,

Here are logs:

ES Log from when it worked fine last time:
[2011-03-18 11:34:01,025][INFO ][node ] [Geb] {elasticsearch/0.15.2}[1086]: initializing ...
[2011-03-18 11:34:01,027][INFO ][plugins ] [Geb] loaded []
[2011-03-18 11:34:03,211][INFO ][node ] [Geb] {elasticsearch/0.15.2}[1086]: initialized
[2011-03-18 11:34:03,211][INFO ][node ] [Geb] {elasticsearch/0.15.2}[1086]: starting ...
[2011-03-18 11:34:03,263][INFO ][transport ] [Geb] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.103:9300]}
[2011-03-18 11:34:06,285][INFO ][cluster.service ] [Geb] new_master [Geb][DL5ARuvKQhekQluSVd8piA][inet[/192.168.1.103:9300]], reason: zen-disco-join (elected_as_master)
[2011-03-18 11:34:06,324][INFO ][discovery ] [Geb] datastore/DL5ARuvKQhekQluSVd8piA
[2011-03-18 11:34:06,445][INFO ][http ] [Geb] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.103:9200]}
[2011-03-18 11:34:06,445][INFO ][node ] [Geb] {elasticsearch/0.15.2}[1086]: started
[2011-03-18 11:39:07,036][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true
[2011-03-18 11:39:07,036][INFO ][cluster.metadata ] [Geb] [pretty=true] creating index, cause [gateway], shards [5]/[1], mappings []
[2011-03-18 11:39:07,328][INFO ][cluster.metadata ] [Geb] [storage] creating index, cause [gateway], shards [5]/[1], mappings [100000369445615, 529671604, 690882346, 526489848, 537909592, 343, fileinformation, 344, 345, 1484242128, 100002154019923, 552059847, 638109632, 100001435509870, 528006034, user]
[2011-03-18 11:39:07,576][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true
[2011-03-18 11:39:07,678][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=0
[2011-03-18 11:39:07,678][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=0,shardType=store
[2011-03-18 11:39:07,695][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=1
[2011-03-18 11:39:07,696][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=1,shardType=store
[2011-03-18 11:39:07,697][INFO ][cluster.metadata ] [Geb] [pretty=true] created and added to cluster_state
[2011-03-18 11:39:07,699][INFO ][cluster.metadata ] [Geb] [storage] created and added to cluster_state
[2011-03-18 11:39:07,766][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=2
[2011-03-18 11:39:07,766][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=2,shardType=store
[2011-03-18 11:39:07,779][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=3
[2011-03-18 11:39:07,779][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=3,shardType=store
[2011-03-18 11:39:07,798][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=4
[2011-03-18 11:39:07,798][WARN ][jmx ] [Geb] Could not register object with name: service=indices,index=pretty=true,subService=shards,shard=4,shardType=store

ES Logs since it's failing:
[2011-03-21 15:00:31,509][INFO ][node ] [Master Pandemonium] {elasticsearch/0.15.2}[1082]: started
[2011-03-21 15:51:56,045][INFO ][node ] [Master Pandemonium] {elasticsearch/0.15.2}[1082]: stopping ...
[2011-03-21 15:51:56,063][INFO ][node ] [Master Pandemonium] {elasticsearch/0.15.2}[1082]: stopped
[2011-03-21 15:51:56,063][INFO ][node ] [Master Pandemonium] {elasticsearch/0.15.2}[1082]: closing ...
[2011-03-21 15:51:56,071][INFO ][node ] [Master Pandemonium] {elasticsearch/0.15.2}[1082]: closed
[2011-03-21 15:52:27,760][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: initializing ...
[2011-03-21 15:52:27,773][INFO ][plugins ] [Sunstreak] loaded []
[2011-03-21 15:52:29,935][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: initialized
[2011-03-21 15:52:29,936][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: starting ...
[2011-03-21 15:52:29,987][INFO ][transport ] [Sunstreak] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/fe80:0:0:0:227:eff:fe2b:9c81%2:9300]}
[2011-03-21 15:53:00,001][WARN ][discovery ] [Sunstreak] waited for 30s and no initial state was set by the discovery
[2011-03-21 15:53:00,002][INFO ][discovery ] [Sunstreak] datastore/DCKl3rA1TES8Jk-N2zhi-g
[2011-03-21 15:53:00,114][INFO ][http ] [Sunstreak] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.103:9200]}
[2011-03-21 15:53:00,115][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: started
[2011-03-21 16:12:25,845][WARN ][http.netty ] [Sunstreak] Caught exception while handling client http traffic, closing connection
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:169)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:321)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-03-21 19:18:49,630][WARN ][http.netty ] [Sunstreak] Caught exception while handling client http traffic, closing connection
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:169)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:321)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-03-21 20:30:14,099][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: stopping ...
[2011-03-21 20:30:14,106][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: stopped
[2011-03-21 20:30:14,106][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: closing ...
[2011-03-21 20:30:14,114][INFO ][node ] [Sunstreak] {elasticsearch/0.15.2}[1105]: closed
69,1 Bot

Okay something changed now with restart today & no changes from my side (had tried this after Shay's suggestion, but had no luck)
$ curl -XGET 'http://192.168.1.103:9200/_cluster/state'
{
"cluster_name": "datastore",
"master_node": "-dwBDM7qRo6ZAYVTS5QtFg",
"blocks": {
"global": {
"1": {
"description": "state not recovered / initialized",
"retryable": true,
"disable_state_persistence": true,
"levels": [
"read",
"write",
"metadata"
]
}
}
},
"nodes": {
"-dwBDM7qRo6ZAYVTS5QtFg": {
"name": "Apryll",
"transport_address": "inet[/192.168.1.103:9300]",
"attributes": {}
}
},
"metadata": {
"templates": {},
"indices": {}
},
"routing_table": {
"indices": {}
},
"routing_nodes": {
"unassigned": [],
"nodes": {}
},
"allocations": [ ]
}

Now ES logs:
[2011-03-22 13:31:03,413][INFO ][node] [Phineas T. Horton] {elasticsearch/0.15.2}[1088]: closed
[2011-03-22 13:41:45,078][INFO ][node] [Shooting Star] {elasticsearch/0.15.2}[1070]: initializing ...
[2011-03-22 13:41:45,093][INFO ][plugins ] [Shooting Star] loaded []
[2011-03-22 13:41:47,475][INFO ][node ] [Shooting Star] {elasticsearch/0.15.2}[1070]: initialized
[2011-03-22 13:41:47,475][INFO ][node ] [Shooting Star] {elasticsearch/0.15.2}[1070]: starting ...
[2011-03-22 13:41:47,527][INFO ][transport ] [Shooting Star] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.103:9300]}
[2011-03-22 13:41:50,549][INFO ][cluster.service ] [Shooting Star] new_master [Shooting Star][ZfPmhc5NTzuQhTOdq5iGAA][inet[/192.168.1.103:9300]], reason: zen-disco-join (elected_as_master)
[2011-03-22 13:41:50,584][INFO ][discovery ] [Shooting Star] datastore/ZfPmhc5NTzuQhTOdq5iGAA
[2011-03-22 13:41:50,709][INFO ][http ] [Shooting Star] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.103:9200]}
[2011-03-22 13:41:50,710][INFO ][node] [Shooting Star] {elasticsearch/0.15.2}[1070]: started

ES is back: Steps taken
Cleaned data directory and replaced with backup from Mar-18

Still wondering what may have gone wrong. Till this problem is reproduced, i guess keeping backup of data everyday or X hours is safe bet.

Just for a note, for this kind of use case where in installation is on dedicated server with index is simplefs or mmapfs, additional option of saving backup to s3 should be enabled. This shall help in need for taking backups manually or using external tools.

Best Regards,
Aditya

Did you remove the gateway configuration by any chance? The recore_after and expected parameters?
On Tuesday, March 22, 2011 at 10:28 AM, aditya.kulkarni wrote:

ES is back: Steps taken
Cleaned data directory and replaced with backup from Mar-18

Still wondering what may have gone wrong. Till this problem is reproduced, i
guess keeping backup of data everyday or X hours is safe bet.

Just for a note, for this kind of use case where in installation is on
dedicated server with index is simplefs or mmapfs, additional option of
saving backup to s3 should be enabled. This shall help in need for taking
backups manually or using external tools.

Best Regards,
Aditya


Best Regards,
a

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Errors-MasterNotDiscoveredException-IndexMissingException-tp2709737p2714046.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Nope. Those were same as mentioned before. that's what makes it more interesting problem.

Trying crazy to reproduce same problem on another machine in different network. If I get to reproduce it, i shall get back to you.

One thing that i observed is that when installed with service wrapper to run in service mode (ubuntu 10.04LTS), it do not start autmatically if network is down (unplugged rj45) and started later.