All nodes except Master show as "Offline"

I'm seeing the same thing as Chris, except that I can click on the names of my nodes and get the current stats on them (although I can't see the shard allocations on the nodes that are not master).

I'm running shield, but I'm using the local exporter on Marvel.

I'm not running Shield, just all the free stuff. :\

I turned on DEBUG for Marvel, but I'm not seeing any errors on any of the nodes, just the following:

[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]

Looking within the .marvel-es-1-* index appears that I'm getting the data from each of the nodes, but that it's just not displaying them on the node overview page.

Hi, Same problem for me after Migration from 2.2 to 2.3.1

@tebriel: Since you were the first to report it, let's continue with your data as it sounds like it may be a wider issue.

Can you run this query against your cluster and attach the response?

GET /.marvel-es-1*/node_stats/_search
{
   "size" : 0,
   "aggs" : {
      "nodes" : {
         "date_histogram" : {
            "interval" : "10s",
            "field" : "timestamp",
            "order" : {
               "_key" : "desc"
            },
            "min_doc_count" : 1
         },
         "aggs" : {
            "source_node_name" : {
               "terms" : {
                  "field" : "source_node.name"
               },
               "aggs" : {
                  "source_node_transport_address" : {
                     "terms" : {
                        "field" : "source_node.transport_address"
                     }
                  }
               }
            }
         }
      }
   }
}

Thanks!

{
  "_shards": {
    "failed": 0,
    "successful": 6,
    "total": 6
  },
  "hits": {
    "hits": [
      {
        "_id": "AVPS6Hs_HfEfRmF9lCyp",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPS6A-6HfEfRmF9lCoZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlVhXir2ZpyJFW7VZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlgxcir2ZpyJFW7Vc",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlnQ6ir2ZpyJFW7Ve",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlOzsir2ZpyJFW7VX",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlSQAir2ZpyJFW7VY",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlY0yir2ZpyJFW7Va",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlc2yir2ZpyJFW7Vb",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlj3rir2ZpyJFW7Vd",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      }
    ],
    "max_score": 1.0,
    "total": 122720
  },
  "timed_out": false,
  "took": 696
}

@tebriel: Can you try resending the request as a POST? It looks like whatever tool you used stripped out the request's body (web browsers do not natively support sending GET requests with a body).

@Jakauppila: To try a different approach, you are clearly naming your nodes. Can you try adding this setting to your Kibana configuration, then restart Kibana, to see if it resolves the issue?

marvel.node_resolver: name

For anyone interested, the marvel.node_resolver setting was added in Marvel 2.3. The default -- and only other -- value for it is transport_address.

I have had no luck reproducing this locally, but I want to see if this at least resolves it, which will point to a problem.

1 Like

That looks to have fixed my problem! I can see the node info on the overview page as well as seeing the index/shard info when clicking into each node.

So, I reran the query as a POST (duh, sorry, was in a meeting and didn't think about that my rest client won't send a body on a GET). The response is huge.

I'll try using the name_resolver next.
Thanks

Hi @tebriel,

Given that your node names have unique, static names (I had to check that they weren't in the default list because Greek god names are so close sometimes!), I'm hopeful that it also fixes it for you. I'm looking over the data though.

Thanks!

To help debug, this is how I launch Elasticsearch inside a docker:

elasticsearch -Des.cluster.name="pindrop_elk" -Des.discovery.zen.ping.unicast.hosts="elk1.cc.pdrop.net, elk2.cc.pdrop.net, elk3.cc.pdrop.net, elk4.cc.pdrop.net" -Des.node.name="Apophis" -Des.network.bind_host="0.0.0.0" -Des.network.publish_host=elk1.cc.pdrop.net -Des.node.master=true -Des.node.data=true -Des.path.data=/usr/share/elasticsearch/data
1 Like

Yes, using the name_resolver does fix the issue for now, thanks!

1 Like

Awesome. As long as your transport address isn't changing (I noticed that it change for Jared), then it shouldn't be required for you to use this setting, but I'm glad that it resolves the issue.

We're digging into the root cause and we'll hopefully a fix out in the next release.

Sweet, thanks! I'm pushing the updated kibana config out. Appreciate your help.

Whoops, sorry, the transport address didn't change, I had just done it against our Prod cluster the second time. I changed the node names with Firebug since we just use the server names.

Aw. Well, there goes that theory. At least know there's a workaround. :slight_smile:

Well, it partially works. Shards per node and Unassigned Shard Count is 0 (just upgraded the last of the 4 nodes to 2.3.1.

Screenshot

Just to add that we faced the same issue on our 9 node cluster and the node_resolver setting fixed it completely.
Thanks.

1 Like

@tebriel, I do not think that that issue is related. Do you mind creating a separate Discuss post for it if it's still happening?

I have tracked down the issue that is causing this problem and I expect to have the fix pushed back to v2.3.2, v2.4.0, and of course v5, barring some unexpected issue.