Deleted .marvel-es-data

monitoring

(Tim Desrochers) #1

Elasticsearch 2.0
Marvel 2.0
Kibana 4.2.1

I set up collecting data from a cluster into a "monitoring" cluster. My exporter config on my ES nodes are:

marvel.agent.exporters:
  id1:
    type: http
    host: [ "10.1.55.21:9200", "10.1.55.22:9200" ]

This is the same config on all my ES nodes I want to collet data on. After installing the license and agent on all the ES nodes I installed the plugin into Kibana. After some time I realized that I was collecting data on my monitoring cluster not my ES nodes. I opened elasticsearch.yml and added to the monitoring cluster:

marvel.enabled: false

Then I deleted all indices with .marvel-* This of course deleted the .marvel-es-data index. After I did this I went to kibana clicked on the Marvel plugin and noticed it said "Waiting for Marvel Data" and it never moves past this. I am collecting marvel data from my ES nodes, I see the indices being created and growing but nothing in the plugin. I assume that the error is because I deleted the .marvel-es-data index but I'm not sure.

Everything else seems to be working. On my monitoring cluster in the /var/lib/elasticsearch directory I see folders for both my monitoring cluster and my ES cluster. I know it sees the data but its not displaying.

How do I get that index back? I've tried restarting nodes, uninstalling the plugin on the monitoring cluster, kibana node, and ES nodes with no luck?


(Tanguy) #2

Hi,

Just to be sure, this setting is on your monitoring cluster right? Not the one you want to monitor.

Also, you don't need to install the marvel-agent plugin in your monitoring cluster but only in the cluster you want to monitor.

What's the ouput of GET /_cat/shards?v on your monitoring cluster?

Do that mean that your monitored cluster and your monitoring cluster are installed on the same laptop? How did you install elasticsearch? Can you check that they do not share the same cluster name nor the same configuration files?

Thanks


(Tim Desrochers) #3

Yes the setting is on my monitoring cluster not the monitored.

Thank you for the clarification on the needed plugin, I wasn't sure if it was needed since the docs say to put that setting on your monitoring cluster.

The results of _cat/shards:

index                 shard prirep state   docs  store ip         node
.kibana               0     r      STARTED    2 15.5kb 10.1.55.22 HEALTH_NODE_2
.kibana               0     p      STARTED    2 15.4kb 10.1.55.21 HEALTH_NODE_1
.marvel-es-2015.11.24 0     r      STARTED 6847  1.8mb 10.1.55.22 HEALTH_NODE_2
.marvel-es-2015.11.24 0     p      STARTED 6847  3.7mb 10.1.55.21 HEALTH_NODE_1

Lastly, No my monitoring and monitored cluster are not on the same machine. They are all separate VM's. I installed ES from rpm package and they have very different cluster names


(Tim Desrochers) #4

I should correct myself, they are on the same machine, just separate VM's


(Tanguy) #5

Thanks.

I agree the documentation is a bit confusing here, we'll improve that.

Can you give us the output of `GET /_cat/nodes?v' both on your monitored and monitoring clusters please?

I'm afraid I cannot reproduce your issue but you're not the first one to report this problem so I suspect there's something wrong under cover


(Tim Desrochers) #6

Monitoring Cluster

 curl 10.1.55.21:9200/_cat/nodes?v
host       ip         heap.percent ram.percent load node.role master name
10.1.55.21 10.1.55.21           42           9 0.81 d         *      HEALTH_NODE_1
10.1.55.22 10.1.55.22           50           7 0.44 d         m      HEALTH_NODE_2

Monitored Cluster

curl 10.1.55.2:9200/_cat/nodes?v
host       ip         heap.percent ram.percent load node.role master name
10.1.55.19 10.1.55.19            0           7 0.25 -         -      KIBANA_SEARCH_BALANCER_2
10.1.55.9  10.1.55.9            39          26 1.11 d         -      WORKER_NODE_2
10.1.55.13 10.1.55.13           41          27 0.89 d         -      WORKER_NODE_6
10.1.55.18 10.1.55.18            5           6 0.17 -         -      KIBANA_SEARCH_BALANCER_1
10.1.55.10 10.1.55.10           37          26 0.63 d         -      WORKER_NODE_3
10.1.55.20 10.1.55.20            4           4 0.06 -         m      MASTER_NODE_3
10.1.55.12 10.1.55.12           34          25 1.23 d         -      WORKER_NODE_5
10.1.55.11 10.1.55.11           37          26 1.60 d         -      WORKER_NODE_4
10.1.55.14 10.1.55.14           39          26 0.55 d         -      WORKER_NODE_7
10.1.55.8  10.1.55.8            47          30 0.62 d         -      WORKER_NODE_1
10.1.55.7  10.1.55.7             7           3 0.21 -         m      MASTER_NODE_2
10.1.55.16 10.1.55.16           35          25 1.10 d         -      WORKER_NODE_9
10.1.55.15 10.1.55.15           35          26 0.80 d         -      WORKER_NODE_8
10.1.55.17 10.1.55.17           41          28 0.71 d         -      WORKER_NODE_10
10.1.55.2  10.1.55.2            51           6 0.18 -         *      MASTER_NODE_1

(Tanguy) #7

Thanks.

Do you see anything wrong in the logs of the master node(s)?


(Tim Desrochers) #8

I started getting this around midnight:

[2015-11-25 00:00:13,872][WARN ][http.netty               ] [HEALTH_NODE_1] Caught exception while handling client http traffic, closing connection [id: 0xc3c2bfc2, /10.1.55.2:48602 => /10.1.55.21:9200]

[2015-11-25 09:23:20,873][INFO ][rest.suppressed          ] /_bulk Params: {}
ElasticsearchParseException[Failed to derive xcontent]

I stopped all my ES nodes in my monitored cluster (to do some other maintenance) and when I restarted them I had marvel showing me data, but today when I cam back I am getting the above error.

EDIT***

Further research shows the following on my master nodes in my monitored cluster:

[2015-11-25 10:19:11,695][ERROR][marvel.agent.exporter.http] failed sending data to [http://10.1.55.22:9200/_bulk]: IOException[Error writing to server]
[2015-11-25 10:19:11,695][ERROR][marvel.agent             ] [MASTER_NODE_1] background thread had an uncaught exception
java.lang.OutOfMemoryError: Java heap space

Currently my master nodes have 32G of ram with 24 set in /etc/sysconfig/elasticsearch:
ES_HEAP_SIZE=24g

I'm working on getting all nodes updated to 64 gig with 30 set to ES_HEAP_SIZE


(Steve Kearns) #9

Hi Tim,

Is it possible to upgrade to ES / Marvel 2.1 and Kibana 4.3? We made a number of improvements in the marvel agent that fix some bugs that could lead to high memory usage of the marvel agent.

Thanks,
Steve


(Tim Desrochers) #10

absolutely, I will download the file and push it to my machines today. I'll let you know of the results

Thanks


(Tim Desrochers) #11

Steve

I performed a rolling upgrade of my nodes from ES 2.0 - 2.1, Marvel 2.0 - 2.1 and Kibana 4.2 - 4.3. So far Marvel is working as expected.

I like the shard allocation section you guys added, very nice to see what is going on, and this history button is nice as well. Thanks for the help and nice work on the product


(Steve Kearns) #12

I'm glad that resolved the issue! Thanks for the kind words, I'll pass them on to the rest of the team!


(Tanguy) #13

Happy to see your problem solved.

For the record, a bug in Marvel 2.0 has been found and fixed in 2.1 and I'm pretty sure you hit it.


(Tim Desrochers) #14

Not sure what is going on here but there may be some bug in 2.1 as well. All my node names are not the node names. See image:

I gave my nodes easily recognizable names not what looks like hash values of the names. Any suggestions?


(Tim Desrochers) #15

Can anyone let me know why the names of my nodes show up as this strange string of letters. This only happened after upgrading to Marvel 2.1


(Tanguy) #16

I think you hit the same issue described here: Node names missing after upgrading ES and Marvel to 2.1, KB to 4.3.0


(system) #17