All nodes except Master show as "Offline"

monitoring

(Chris M) #1

Running

  • Elasticsearch 2.3.0
  • Kibana 4.5.0
  • Marvel-agent 2.3.0

Description

The nodes page in Marvel shows all 4 nodes in my cluster, with proper metadata about each (hostname, ip address, name). However all except the master node show as "offline". The master has appropriate data about disk usage, CPU, etc, but the other nodes show as "offline".

Extra Notes

  • Upgrading to marvel-agent 2.3.0 from 2.2.1 caused marvel to start storing data in the index pattern marvel-es-1-*, whereas previously data was stored in marvel-es-*.
  • Nodes that are offline for longer than the time bounds in Marvel fall off the screen, so some metadata is still being reported by the nodes that allows them to continue to be displayed.
  • Clicking on the name of any of the nodes (including the master) causes a redirection to the root of marvel, instead of a node detail page.
  • No exceptions in the ElasticSearch or Kibana logs.

Screenshot


(Chris Earle) #2

Hi Chris (great name),

Can you show the output of

curl -XGET host:9200/_cat/plugins?v

from any of your nodes?

Thanks,
Chris


(Chris M) #3

Thanks Chris :D,

curl -XGET "http://elk1.cc.pdrop.net:9200/_cat/plugins?v=1"
name    component    version type url
Apophis license      2.3.0   j
Apophis marvel-agent 2.3.0   j
Anubis  license      2.3.0   j
Anubis  marvel-agent 2.3.0   j
Hathor  license      2.3.0   j
Hathor  marvel-agent 2.3.0   j
Nirrti  license      2.3.0   j
Nirrti  marvel-agent 2.3.0   j

(Chris Earle) #4

Just to be sure, can you verify that you have the proper version of the Kibana plugin?

$ grep version installedPlugins/marvel/package.json

If that all matches up, then the next step is to crack open the index to start to see what's there and not there.

Thanks,
Chris


(Chris M) #5

Hey Chris,
Thanks for your help:

root@500514ad55df:/opt/kibana# grep version installedPlugins/marvel/package.json
 "version": "2.3.0",

(Jared Kauppila) #6

I'm seeing the same thing as Chris, except that I can click on the names of my nodes and get the current stats on them (although I can't see the shard allocations on the nodes that are not master).

I'm running shield, but I'm using the local exporter on Marvel.


(Chris M) #7

I'm not running Shield, just all the free stuff. :\


(Jared Kauppila) #8

I turned on DEBUG for Marvel, but I'm not seeing any errors on any of the nodes, just the following:

[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:32,187][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:54,391][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,594][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:55,797][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 0, value: .marvel-es-1]] in version [1]
[2016-04-05 16:27:56,000][DEBUG][marvel.agent.exporter.local] found index template [[cursor, index: 4, value: .marvel-es-data-1]] in version [1]

Looking within the .marvel-es-1-* index appears that I'm getting the data from each of the nodes, but that it's just not displaying them on the node overview page.


(Sébastien Barut) #9

Hi, Same problem for me after Migration from 2.2 to 2.3.1


(Chris Earle) #10

@tebriel: Since you were the first to report it, let's continue with your data as it sounds like it may be a wider issue.

Can you run this query against your cluster and attach the response?

GET /.marvel-es-1*/node_stats/_search
{
   "size" : 0,
   "aggs" : {
      "nodes" : {
         "date_histogram" : {
            "interval" : "10s",
            "field" : "timestamp",
            "order" : {
               "_key" : "desc"
            },
            "min_doc_count" : 1
         },
         "aggs" : {
            "source_node_name" : {
               "terms" : {
                  "field" : "source_node.name"
               },
               "aggs" : {
                  "source_node_transport_address" : {
                     "terms" : {
                        "field" : "source_node.transport_address"
                     }
                  }
               }
            }
         }
      }
   }
}

(Chris M) #11

Thanks!

{
  "_shards": {
    "failed": 0,
    "successful": 6,
    "total": 6
  },
  "hits": {
    "hits": [
      {
        "_id": "AVPS6Hs_HfEfRmF9lCyp",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPS6A-6HfEfRmF9lCoZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlVhXir2ZpyJFW7VZ",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlgxcir2ZpyJFW7Vc",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlnQ6ir2ZpyJFW7Ve",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlOzsir2ZpyJFW7VX",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlSQAir2ZpyJFW7VY",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlY0yir2ZpyJFW7Va",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlc2yir2ZpyJFW7Vb",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      },
      {
        "_id": "AVPSlj3rir2ZpyJFW7Vd",
        "_index": ".marvel-es-1-2016.04.01",
        "_score": 1.0,
        "_source": {},
        "_type": "node_stats"
      }
    ],
    "max_score": 1.0,
    "total": 122720
  },
  "timed_out": false,
  "took": 696
}

(Chris Earle) #12

@tebriel: Can you try resending the request as a POST? It looks like whatever tool you used stripped out the request's body (web browsers do not natively support sending GET requests with a body).


(Chris Earle) #13

@Jakauppila: To try a different approach, you are clearly naming your nodes. Can you try adding this setting to your Kibana configuration, then restart Kibana, to see if it resolves the issue?

marvel.node_resolver: name

For anyone interested, the marvel.node_resolver setting was added in Marvel 2.3. The default -- and only other -- value for it is transport_address.

I have had no luck reproducing this locally, but I want to see if this at least resolves it, which will point to a problem.


(Jared Kauppila) #14

That looks to have fixed my problem! I can see the node info on the overview page as well as seeing the index/shard info when clicking into each node.


(Chris M) #15

So, I reran the query as a POST (duh, sorry, was in a meeting and didn't think about that my rest client won't send a body on a GET). The response is huge.

I'll try using the name_resolver next.
Thanks


(Chris Earle) #16

Hi @tebriel,

Given that your node names have unique, static names (I had to check that they weren't in the default list because Greek god names are so close sometimes!), I'm hopeful that it also fixes it for you. I'm looking over the data though.

Thanks!


(Chris M) #17

To help debug, this is how I launch Elasticsearch inside a docker:

elasticsearch -Des.cluster.name="pindrop_elk" -Des.discovery.zen.ping.unicast.hosts="elk1.cc.pdrop.net, elk2.cc.pdrop.net, elk3.cc.pdrop.net, elk4.cc.pdrop.net" -Des.node.name="Apophis" -Des.network.bind_host="0.0.0.0" -Des.network.publish_host=elk1.cc.pdrop.net -Des.node.master=true -Des.node.data=true -Des.path.data=/usr/share/elasticsearch/data

(Chris M) #18

Yes, using the name_resolver does fix the issue for now, thanks!


(Chris Earle) #19

Awesome. As long as your transport address isn't changing (I noticed that it change for Jared), then it shouldn't be required for you to use this setting, but I'm glad that it resolves the issue.

We're digging into the root cause and we'll hopefully a fix out in the next release.


Marvel only shows data from master node
(Chris M) #20

Sweet, thanks! I'm pushing the updated kibana config out. Appreciate your help.