All nodes except Master show as "Offline"

tebriel · April 4, 2016, 5:45pm

Running

Elasticsearch 2.3.0
Kibana 4.5.0
Marvel-agent 2.3.0

Description

The nodes page in Marvel shows all 4 nodes in my cluster, with proper metadata about each (hostname, ip address, name). However all except the master node show as "offline". The master has appropriate data about disk usage, CPU, etc, but the other nodes show as "offline".

Extra Notes

Upgrading to marvel-agent 2.3.0 from 2.2.1 caused marvel to start storing data in the index pattern marvel-es-1-*, whereas previously data was stored in marvel-es-*.
Nodes that are offline for longer than the time bounds in Marvel fall off the screen, so some metadata is still being reported by the nodes that allows them to continue to be displayed.
Clicking on the name of any of the nodes (including the master) causes a redirection to the root of marvel, instead of a node detail page.
No exceptions in the ElasticSearch or Kibana logs.

Screenshot

pickypg · April 4, 2016, 6:22pm

Hi Chris (great name),

Can you show the output of

curl -XGET host:9200/_cat/plugins?v

from any of your nodes?

Thanks,
Chris

tebriel · April 4, 2016, 6:34pm

Thanks Chris :D,

curl -XGET "http://elk1.cc.pdrop.net:9200/_cat/plugins?v=1"
name    component    version type url
Apophis license      2.3.0   j
Apophis marvel-agent 2.3.0   j
Anubis  license      2.3.0   j
Anubis  marvel-agent 2.3.0   j
Hathor  license      2.3.0   j
Hathor  marvel-agent 2.3.0   j
Nirrti  license      2.3.0   j
Nirrti  marvel-agent 2.3.0   j

pickypg · April 4, 2016, 7:23pm

Just to be sure, can you verify that you have the proper version of the Kibana plugin?

$ grep version installedPlugins/marvel/package.json

If that all matches up, then the next step is to crack open the index to start to see what's there and not there.

Thanks,
Chris

tebriel · April 4, 2016, 7:40pm

Hey Chris,
Thanks for your help:

root@500514ad55df:/opt/kibana# grep version installedPlugins/marvel/package.json
 "version": "2.3.0",

Jakauppila · April 5, 2016, 8:43pm

I'm seeing the same thing as Chris, except that I can click on the names of my nodes and get the current stats on them (although I can't see the shard allocations on the nodes that are not master).

I'm running shield, but I'm using the local exporter on Marvel.

tebriel · April 5, 2016, 9:24pm

I'm not running Shield, just all the free stuff. :\

Jakauppila · April 5, 2016, 9:47pm

I turned on DEBUG for Marvel, but I'm not seeing any errors on any of the nodes, just the following:

[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]

Looking within the .marvel-es-1-* index appears that I'm getting the data from each of the nodes, but that it's just not displaying them on the node overview page.

Grazius · April 6, 2016, 2:52pm

Hi, Same problem for me after Migration from 2.2 to 2.3.1

pickypg · April 6, 2016, 3:07pm

@tebriel: Since you were the first to report it, let's continue with your data as it sounds like it may be a wider issue.

Can you run this query against your cluster and attach the response?

GET /.marvel-es-1*/node_stats/_search
{
   "size" : 0,
   "aggs" : {
      "nodes" : {
         "date_histogram" : {
            "interval" : "10s",
            "field" : "timestamp",
            "order" : {
               "_key" : "desc"
            },
            "min_doc_count" : 1
         },
         "aggs" : {
            "source_node_name" : {
               "terms" : {
                  "field" : "source_node.name"
               },
               "aggs" : {
                  "source_node_transport_address" : {
                     "terms" : {
                        "field" : "source_node.transport_address"
                     }
                  }
               }
            }
         }
      }
   }
}

tebriel · April 6, 2016, 4:09pm

Thanks!

{
  "_shards": {
    "failed": 0,
    "successful": 6,
    "total": 6
  },
  "hits": {
    "hits": [
      {
        "_id": "AVPS6Hs_HfEfRmF9lCyp",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPS6A-6HfEfRmF9lCoZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlVhXir2ZpyJFW7VZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlgxcir2ZpyJFW7Vc",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlnQ6ir2ZpyJFW7Ve",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlOzsir2ZpyJFW7VX",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlSQAir2ZpyJFW7VY",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlY0yir2ZpyJFW7Va",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlc2yir2ZpyJFW7Vb",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlj3rir2ZpyJFW7Vd",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      }
    ],
    "max_score": 1.0,
    "total": 122720
  },
  "timed_out": false,
  "took": 696
}

pickypg · April 6, 2016, 4:23pm

@tebriel: Can you try resending the request as a POST? It looks like whatever tool you used stripped out the request's body (web browsers do not natively support sending GET requests with a body).

pickypg · April 6, 2016, 6:27pm

@Jakauppila: To try a different approach, you are clearly naming your nodes. Can you try adding this setting to your Kibana configuration, then restart Kibana, to see if it resolves the issue?

marvel.node_resolver: name

For anyone interested, the marvel.node_resolver setting was added in Marvel 2.3. The default -- and only other -- value for it is transport_address.

I have had no luck reproducing this locally, but I want to see if this at least resolves it, which will point to a problem.

Jakauppila · April 6, 2016, 6:34pm

That looks to have fixed my problem! I can see the node info on the overview page as well as seeing the index/shard info when clicking into each node.

tebriel · April 6, 2016, 9:30pm

So, I reran the query as a POST (duh, sorry, was in a meeting and didn't think about that my rest client won't send a body on a GET). The response is huge.

I'll try using the name_resolver next.
Thanks

pickypg · April 6, 2016, 9:32pm

Hi @tebriel,

Given that your node names have unique, static names (I had to check that they weren't in the default list because Greek god names are so close sometimes!), I'm hopeful that it also fixes it for you. I'm looking over the data though.

Thanks!

tebriel · April 6, 2016, 9:34pm

To help debug, this is how I launch Elasticsearch inside a docker:

elasticsearch -Des.cluster.name="pindrop_elk" -Des.discovery.zen.ping.unicast.hosts="elk1.cc.pdrop.net, elk2.cc.pdrop.net, elk3.cc.pdrop.net, elk4.cc.pdrop.net" -Des.node.name="Apophis" -Des.network.bind_host="0.0.0.0" -Des.network.publish_host=elk1.cc.pdrop.net -Des.node.master=true -Des.node.data=true -Des.path.data=/usr/share/elasticsearch/data

tebriel · April 6, 2016, 9:44pm

Yes, using the name_resolver does fix the issue for now, thanks!

pickypg · April 6, 2016, 9:45pm

Awesome. As long as your transport address isn't changing (I noticed that it change for Jared), then it shouldn't be required for you to use this setting, but I'm glad that it resolves the issue.

We're digging into the root cause and we'll hopefully a fix out in the next release.

tebriel · April 6, 2016, 9:46pm

Sweet, thanks! I'm pushing the updated kibana config out. Appreciate your help.

Topic		Replies	Views
Marvel only shows data from master node Elasticsearch elastic-stack-monitoring	6	1682	July 6, 2017
Marvel config for 3 nodes on the same machine (dev) Elasticsearch	4	544	July 6, 2017
Marvel only shows master node Elasticsearch elastic-stack-monitoring	2	1269	April 24, 2017
Why isn't Marvel showing my nodes or indices? Kibana elastic-stack-monitoring	3	4267	July 6, 2017
Deleted .marvel-es-data Kibana elastic-stack-monitoring	16	4354	July 6, 2017

All nodes except Master show as "Offline"

Running

Description

Extra Notes

Screenshot

Related topics