APM Server Cluster Monitoring in Kibana

monitoring

(Ronald Tumulak) #1

Hi,

We've set up an Elastic APM cluster composed of three APM servers fronted by Haproxy.

Load testing result has been great on this stack, and I could see that all APM Server instances are happily distributing the load against a cluster of seven Elasticsearch nodes (3 master only, 4 data only). I confirm this in the Haproxy response headers.

One thing I noticed is that Kibana monitoring is only showing a single APM Server instance, which I suspect is the first APM server to register in Elasticsearch. How do i get the other APM Server instances to display in Kibana? APM Servers count is "1"

Our set up:

Haproxy ----> (3) APM Servers ----> (7) Elasticsearch <---Kibana

All are installed via docker.

I've tested two configuration for the APM Servers: via individually started containers and as a docker swarm scale up. Results are still the same, only one APM Server instance is displayed in the Kibana monitoring page (using Docker container ID). APM server containers are all built on the same image, all share the same configuration except for the individually started containers which have their own published IP address being used by Haproxy.

Kibana correctly shows 7 Elasticsearch server instances.

[Edit]
Forgot to mention that Machine Learning shows three beat names corresponding to the docker ids of the APM Server containers.

Any help is appreciated as we'd like to be able to monitor all APM Server instances in Kibana.
[Edit]

Thanks!

Ronald


(Gil Raphaelli) #2

@digitalron I'm glad to the hear the load testing is going well. I'm only able to reproduce your results when setting the name directive in apm-server.yml - I see in your edit ML is showing three beat names but I'm hoping you can confirm those are not set or unique across your apm-servers? The Kibana UI essentially does this query:

GET .monitoring-beats-6-*/_search
{
  "size": 0, 
  "aggs": {
    "apm-servers": {
      "terms": {
        "field": "beats_state.beat.name",
        "size": 10
      }
    }
  }
}

If that returns multiple entries then something is indeed wrong. An aggregation on beats_state.beat.host should show all apm-servers regardless of that setting, so you can compare the results of those queries to be sure each server is reporting in.

If everything checks out, can you provide more details about your installation including Elastic Stack component versions and relevant apm-server.ymls?


(Christian Dahlqvist) #3

Beats typically generate an identifier when they first start up, so if you simply clone this you can end up with multiple instances that have the same I’d and look like one.


(Ronald Tumulak) #4

Thanks Gil and Christian for the responses. I double checked the settings and ran the query above pon both instances and got the following results:

3 individually created APM Server containers
One APM Server found in Monitoring
APM Server Name in Monitoring is the same as container id
Three beats ids found in ML, corresponding to their respective docker container ids
Dev Tools Query in Kibana returned a single beat_state, hoist and name corresponded to docker container and the same displayed in monitoring

3 docker swarm scaled APM Server containers
One APM Server found in Monitoring
APM Server Name in Monitoring is the same as container id
Three beats ids found in ML, corresponding to their respective docker container ids
Dev Tool Query in Kibana did not return beats_state property. It did have beat_stats but like the one above returned a single host and name that corresponded to the docker id displayed under monitoring

Both used the same docker image for APM Server 6.5.4
I did not set any name, and spinning down the containers and spinning up a new one results in the same setup but with a different APM server name which points to a dynamic naming based on the docker container id.

Additional inputs:
The individually created APM Server Docker containers are assigned static ports to which Haproxy points to. The swarm scaled one uses the default 8200 port for each container.

Thanks!

Ronald


(Ronald Tumulak) #5

Related screenshots

Individually created containers


Docker Swarm-scaled containers



(Christian Dahlqvist) #6

So you are launching them directly based off the official base image?


(Ronald Tumulak) #7

Yes. Got them from www.docker.elastic.co


(Shaunak Kashyap) #8

Hi @digitalron,

Could you run the following query please?

POST .monitoring-beats*/_search
{
  "size": 0,
  "aggs": {
    "by_uuid": {
      "terms": {
        "field": "beats_stats.beat.uuid",
        "size": 10
      }
    }
  }
}

(Ronald Tumulak) #9

Hi Shaunak, here is the result of running the query:

{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12339,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "by_uuid" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "8a6e4932-c161-4d82-a192-fc532a04354b",
          "doc_count" : 10560
        }
      ]
    }
  }
}

(Ronald Tumulak) #10

Some additional findings: I observed the monitoring page for some time and noticed that once in a while, the node being displayed changes:

One moment:

After a few seconds...

There is no pattern in how the displayed node changes each time the screen refreshed. One node can be displayed for four successive refreshes, then it alternates, then the other would be displayed thrice in succession, then they would alternate again, seemingly at random.


(Ronald Tumulak) #11

Okay, so I cleared everything and started from scratch, re-pulled the 6.5.4 Docker images and re-ran docker-compose, and the three APM servers are now showing in the monitoring page.

I have no idea what happened but it seems to work so I guess in such a situation, it's probably better to just create new APM Server container instances.

This can be closed now. Thanks!


(system) closed #12

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.