How to Know how many Metric beat agents are up and running

We have configured 100 Metricbeat agents in 100 VM's and we are getting all the logs from that VM's as well.
But we need to monitor the Metricbeat agent as well.
If any Metricbeat agent is not working we need to identify the agent and it has to be displayed in the dashboard.

We have another option for this to get details via heartbeat agent, but we don't want to install the heartbeat in all the VM's.

Plese provide any alternative option to know the status of Metricbeat agents.

Hi @sai07

Couple Options..

Well assuming you are using a fairly recent version of elastic we have that built into the the Metrics Alerts.

Assuming what you really want to do is be alerted if a metricbeat agent stops sending data. You should be able to use the following.

Heartbeat:
As you also noted you could use heartbeat as well to tell if you VMs are network reachable but you are a little confused how heartbeat works you don't install it on every VM... you install it on a seperate VMs and from there you configure it to ping all the 100 VMs that you need to monitor... exactly what it is for... the Uptime App.. you can be alerted on those as well.

Thanks we can see the logs but not for all the Vm's

We have 3 or 4 Vm's in each customer, but we can see only one VM status only.

we configured metric beat in 76 VM's and we can able to see only 28.

Please let us know how to see all the Vm's.
In Metrics Explore Tab we can see all the VM details. find the attached screenshots.


I am not clear on the issue architecture.

Is it 28 hosts each with multiple VMs running as containers on each hosts?

Are all VMs reporting data? How do you know?

Did you press load more charts?

If they are actually container's did you try the different options under the Show pulldown where it says hosts

Is it 28 hosts each with multiple VMs running as containers on each hosts?

No we are not using containers.

Are all VMs reporting data? How do you know?

Yes, We can see in Metrics Explore tab.

Did you press load more charts?

Yes, we can see all the Vm's in metric explorer but not in the inventory.

If they are actually container's did you try the different options under the Show pulldown where it says hosts

No all are seprate VM's

First stop the Auto Refresh on the inventory... sometimes I have issues with that.

Have you turned it to the table view on the right and see what you see?

Interesting can you show me how you see all 76 hosts in Metrics Explore?

And did you press load more charts?

Easy way to test how many host are actually reporting from a data perspective.

Go to Dev Tools

And run this... it will tell you how many unique metricbeat hosts are reporting data in the last 15m

GET /metricbeat-*/_search
{
  "size" : 0,
  "aggs": {
    "hosts": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-15m"
          }
        }
      },
      "aggs": {
        "num_hosts": {
          "cardinality": {
            "field": "host.name"
          }
        }
      }
    }
  }
}

You should get something like this the value below num_hosts is the number of hosts

{
  "took" : 123,
  "timed_out" : false,
  "_shards" : {
    "total" : 7,
    "successful" : 7,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "hosts" : {
      "meta" : { },
      "doc_count" : 500770,
      "num_hosts" : {
        "value" : 4 <<<-------- This Number 
      }
    }
  }
}

If you actually wants to see what hosts are there run this. It will show all the hosts...

GET /metricbeat-*/_search
{
  "size": 0,
  "aggs": {
    "hosts": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-15m"
          }
        }
      },
      "aggs": {
        "num_hosts": {
          "terms": {
            "field": "host.name",
            "size": 200
          }
        }
      }
    }
  }
}

Now I can see all the CPU usage status in the Inventory. please help me out to set alert if anything is not generating the logs.

we are not able to create alert for that find the below error message.

we are using Basic license is it the problem for that reason?

  • List item

In the first picture you are looking at CPU and on the Alert you are looking at Load.
Where did you create that alert from the Metrics Inventory?
Basic should be fine but you will only be able to send the alert to and index or the log.
Did you set up a connector?

Find the below error message.

Yeah, we want to visualize the data that's it

Did you look at the details of the error message the little pull down arrow?

Nothing is there in that.

Where are you creating this? Hmm not sure mine works first try

If you go to Metrics Explorer can you graph the CPU?

I can test now but unable to create rule.

You have to create an action / connector please refer to the docs

Again, we are not able to see all the Vm's in Inventory. As I told in previous conversation we have configured metric beat agent in 74 Vm's but we can see only 18 Vm's

Can you create a new visualisation type metrics :

In the metric visualisation do a unique count on the metricbeat indices using the field host.id:

If we set Field as host.hostname we can see 17 count .
If we set Filed as host.hostname keyword we can 55 count.
Find the attached screenshot.

Ok I see an issue.

I am not if it the issue but there is certainty an issue and it is most likely causing this strange behavior..

Some of your beats are ingesting to the mapping properly 17 ... Some of them are not 55.

You have two data types for the fields for host.hostname

See where you have host.hostname.keyword that is not correct it should not be there... Those agents are writing to an index without the proper mapping and it creating the host.hostaname field with 2 types text and keyword ...that is not correct.

A proper metricbeat index only has host.hostname

I am not sure how you got to this state.. but that's the problem.

Did you start some of the agents before you ran setup for the first time? Are you writing to a custom index?

Those 55 that are writing to
host.name.keyword are not writing to the correct / mapped index.

This can happen when you started a beat without running setup the first time and those beats are ingesting into and incorrectly mapped index.

Note you only need to run setup once... But you need to do it before you start any metricbeat.. whether you then start one beat or 74.

you might first try stopping and starting the beats on the 55 host .. on a couple see if that works

Also in you supply the

GET _cat/indices/?v

We could take a look

You may need to clean up the bad indexes

You know the new fleet collects metrics it's a little less error prone but new / still beta

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.