Metricbeat Perfmon is creating to many single events


(Dirk L√ľneburger) #1

Hi Everyone,

im using the 6.0.0-Alpha2 until now and want to switch to 6.0.1, but when upgrade to 6.0.1 i see that every counter/result is now creating 1 event instead to put all counters/results in one event like before.

so let's say i use 10 counters in metricbeat:

  • 6.0.0-Alpha2 metricbeat is sending 1 event with all 10 results to ES
  • 6.0.1 metricbeat is sending 10 events to ES

thats quite a big increase in space usage if i upgrade on all Servers to 6.0.1

so...is that the normal behavior and something special was in 6.0.0-Alpha2 included?
Cause i really want to only sent 1 event with all counters/results in it.

below is my config and the ES documents

- module: windows
  metricsets: ["perfmon"]
  tags: "myserver00_service.mydomain.com"
  enabled: true
  period: 10s
  perfmon.counters:
    - instance_label: "service.mydomain.com cache api entries"
      instance_name: "service"
      measurement_label: "cache.api.entries"
      query: '\ASP.NET Apps (service)\Cache API Entries'
    - instance_label: "service.mydomain.com cache api hit ratio"
      instance_name: "service"
      measurement_label: "cache.api.hit.ratio"
      query: '\ASP.NET Apps (service)\Cache API Hit Ratio'
    - instance_label: "service.mydomain.com cache api hits"
      instance_name: "service"
      measurement_label: "cache.api.hits"
      query: '\ASP.NET Apps (service)\Cache API Hits'
    - instance_label: "service.mydomain.com cache api misses"
      instance_name: "service"
      measurement_label: "cache.api.misses"
      query: '\ASP.NET Apps (service)\Cache API Misses'
    - instance_label: "service.mydomain.com cache api trims"
      instance_name: "service"
      measurement_label: "cache.api.trims"
      query: '\ASP.NET Apps (service)\Cache API Trims'
    - instance_label: "service.mydomain.com cache api turnover rate"
      instance_name: "service"
      measurement_label: "cache.api.turnover.rate"
      query: '\ASP.NET Apps (service)\Cache API Turnover Rate'
    - instance_label: "service.mydomain.com cache total entries"
      instance_name: "service"
      measurement_label: "cache.total.entries"
      query: '\ASP.NET Apps (service)\Cache Total Entries'
    - instance_label: "service.mydomain.com cache total hit ratio"
      instance_name: "service"
      measurement_label: "cache.total.hit.ratio"
      query: '\ASP.NET Apps (service)\Cache Total Hit Ratio'
    - instance_label: "service.mydomain.com cache total hits"
      instance_name: "service"
      measurement_label: "cache.total.hits"
      query: '\ASP.NET Apps (service)\Cache Total Hits'
    - instance_label: "service.mydomain.com cache total misses"
      instance_name: "service"
      measurement_label: "cache.total.misses"
      query: '\ASP.NET Apps (service)\Cache Total Misses'
    - instance_label: "service.mydomain.com cache total trims"
      instance_name: "service"
      measurement_label: "cache.total.trims"
      query: '\ASP.NET Apps (service)\Cache Total Trims'
    - instance_label: "service.mydomain.com cache total turnover rate"
      instance_name: "service"
      measurement_label: "cache.total.turnover.rate"
      query: '\ASP.NET Apps (service)\Cache Total Turnover Rate'
    - instance_label: "service.mydomain.com.arrival.rate"
      instance_name: "service.mydomain.com"
      measurement_label: "cache.api.entries"
      query: '\HTTP Service Request Queues(service.mydomain.com)\ArrivalRate'
    - instance_label: "service.mydomain.com.cache.hit.rate"
      instance_name: "service.mydomain.com"
      measurement_label: "cache.hit.rate"
      alias: "cache.hit.rate"
      query: '\HTTP Service Request Queues(service.mydomain.com)\CacheHitRate'
    - instance_label: "service.mydomain.com.current.queue.size"
      instance_name: "service.mydomain.com"
      measurement_label: "current.queue.size"
      query: '\HTTP Service Request Queues(service.mydomain.com)\CurrentQueueSize'
    - instance_label: "service.mydomain.com.max.queue.item.age"
      instance_name: "service.mydomain.com"
      measurement_label: "max.queue.item.age"
      query: '\HTTP Service Request Queues(service.mydomain.com)\MaxQueueItemAge'
    - instance_label: "service.mydomain.com.rejected.requests"
      instance_name: "service.mydomain.com"
      measurement_label: "rejected.requests"
      query: '\HTTP Service Request Queues(service.mydomain.com)\RejectedRequests'
    - instance_label: "service.mydomain.com.rejected.rate"
      instance_name: "service.mydomain.com"
      measurement_label: "rejected.rate"
      query: '\HTTP Service Request Queues(service.mydomain.com)\RejectionRate'
    - instance_label: "service.mydomain.com.number.of.active.connectionpoolgroups"
      instance_name: "service.mydomain.com"
      measurement_label: "number.of.active.connectionpoolgroups"
      query: '\.NET Data Provider for SqlServer(_LM_W3SVC_5*)\NumberOfActiveConnectionPoolGroups'
    - instance_label: "service.mydomain.com.number.of.active.connectionpools"
      instance_name: "service.mydomain.com"
      measurement_label: "number.of.active.connectionpools"
      query: '\.NET Data Provider for SqlServer(_LM_W3SVC_5*)\NumberOfActiveConnectionPools'
    - instance_label: "service.mydomain.com.number.of.pooled.connections"
      instance_name: "service.mydomain.com"
      measurement_label: "number.of.pooled.connections"
      query: '\.NET Data Provider for SqlServer(_LM_W3SVC_5*)\NumberOfPooledConnections'
    - instance_label: "service.mydomain.com.number.of.reclaimed.connections"
      instance_name: "service.mydomain.com"
      measurement_label: "number.of.reclaimed.connections"
      query: '\.NET Data Provider for SqlServer(_LM_W3SVC_5*)\NumberOfReclaimedConnections'

output section

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://myurltoElasticCloud:9243"]
  index: "my-index-%{+yyyy.MM.dd}"

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  username: "elastic"
  password: "changeme"

setup.template.name: "my-index-*"
setup.template.pattern: "my-index-*"

Note: need to post the ES document results in a second post.

thanks in advance for any help

Cheers,
Dirk


(Dirk L√ľneburger) #2

the ElasticSearch document that i received before and with 6.0.1

6.0.0-Alpha2

{
  "_index": "my-index-2018.01.02",
  "_type": "doc",
  "_id": "xx",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2018-01-02T22:59:56.184Z",
    "beat": {
      "hostname": "myserver00",
      "name": "myserver00",
      "version": "6.0.0-alpha2"
    },
    "metricset": {
      "module": "windows",
      "name": "perfmon",
      "rtt": 2000
    },
    "tags": [
      "myserver00_service.mydomain.com"
    ],
    "windows": {
      "perfmon": {
        "arrival": {
          "rate": x
        },
        "cache": {
          "api": {
            "entries": x,
            "hit": {
              "ratio": x
            },
            "hits": x,
            "misses": x,
            "trims": x,
            "turnover": {
              "rate": x
            }
          },
          "hit": {
            "rate": x
          },
          "total": {
            "entries": x,
            "hit": {
              "ratio": x
            },
            "hits": x,
            "misses": x,
            "trims": x,
            "turnover": {
              "rate": x
            }
          }
        },
        "current": {
          "queue": {
            "size": x
          }
        },
        "max": {
          "queue": {
            "item": {}
          }
        },
        "number": {
          "of": {
            "active": {
              "connectionpoolgroups": x,
              "connectionpools": x
            },
            "pooled": {
              "connections": x
            }
          },
          "reclaimed": {
            "connections": x
          }
        },
        "rejected": {
          "rate": x,
          "requests": x
        }
      }
    }
  }

6.0.1

{
  "_index": "my-index-2018.01.10",
  "_type": "doc",
  "_id": "xx",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2018-01-10T09:54:31.384Z",
    "windows": {
      "perfmon": {
        "service": {
          "myservice.mydomain.com": {
            "com": {
              "rejected": {
                "rate": "service.mydomain.com"
              }
            }
          }
        },
        "rejected": {
          "rate": x
        }
      }
    },
    "metricset": {
      "module": "windows",
      "name": "perfmon",
      "rtt": 1002
    },
    "tags": [
      "myserver00_service.mydomain.com"
    ],
    "beat": {
      "name": "myserver00",
      "hostname": "myserver00",
      "version": "6.0.1"
    }
  }

#3

Hello,

I'm interested too with a solution, keeping the metricbeat agent.
I have the same issue with the latest version 6.1.1

Too much document are created, which is terrible for space disk. And obviously, 1 document with all counters is better than all these documents alone (1 per counter).

And when we compare the date of each document, this is always the same ...

A workaround is possible if you are using filebeat + integrated Windows perfmon writing to CSV + logstash parser.
You need to restart the Data Collector Set on a schedule (with a dedicated schedule task), to let it overwrite the CSV file.

The main problem of this solution is the perfmon "package" deployment and maintaining the configuration of the collector set.
If you have a few servers, this can be a good workaround.
If you have more than hundred servers, this is difficult to maintain if you plan to change regularly the config.


(Dirk L√ľneburger) #4

Hi @mazoutte,

thanks for the workaround, but then i would stay for now with 6.0.0-alpha2.

but would be great to get some more information about this strange behavior of creating so many documents for 1 counter :slight_smile:


(Andrew Kroh) #5

I think this is related to the wildcard query changes. See the discussion in https://github.com/elastic/beats/pull/4502#issuecomment-308565638.

Maybe there is some middle ground where we can combine data for non-wildcard queries.


(Dirk L√ľneburger) #6

im confused that no one else is complaining about this :roll_eyes: (ok, mazoutte did )

cause thats a big change compared to 6.0.0-aplha2 for the output.

if there is no other solution, then i will stay on 6.0.0-aplha2, not the best way but better then the unneeded grow on index space


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.


(Andrew Kroh) #8

(Andrew Kroh) #9

I just wrote a proposal for grouping counters related to an object into a single event. Comments are welcome on this issue.


(Andrew Kroh) #10