Metricbeat 5.2.2 configtest fails to find templates


(Cliff Sobchuk) #1

I am using Metricbeat 5.2.2 on a Ubuntu 14.04.05 LTS 64 bit trying to run metricbeat configtest from the /usr/share/metricbeat/bin directory as:
sudo ./metricbeat -c /etc/metricbeat/metricbeat.yml -configtest

However, I get the following error reported:
Exiting: error initializing publisher: Error loading template /usr/share/metricbeat/bin/metricbeat.template.json: open /usr/share/metricbeat/bin/metricbeat.template.json: no such file or directory

My /etc/metricbeat directory does include the template files as well:
ls /etc/metricbeat
metricbeat.full.yml metricbeat.template-es2x.json metricbeat.template.json metricbeat.yml

My metricbeat.yml files is:
###################### Metricbeat Configuration Example #######################

# This file is an example configuration file highlighting only the most common
# options. The metricbeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html

#==========================  Modules configuration ============================
metricbeat.modules:

#------------------------------- System Module -------------------------------
- module: system
  metricsets:
# CPU stats
- cpu

# System Load stats
- load

# Per CPU core stats
- core

# IO stats
- diskio

# Per filesystem stats
- filesystem

# File system summary stats
- fsstat

# Memory stats
- memory

# Network stats
- network

# Per process stats
- process

# Sockets (linux only)
- socket
  enabled: true
  period: 10s
  processes: ['.*']
  cgroups: true
  filters:
   drop_event:
     system.process.cgroup.memory.kmem.limit.bytes


#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["192.168.0.40:9200"]
  template.name: "metricbeat"
  template.path: "metricbeat.template.json"
  template.overwrite: true
  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

I also can't seem to get the filters: stanza to actually drop an offending event as it continues to end up in the /var/log/metricbeat/metricbeat file with an error:
2017-05-11T14:53:28Z WARN Can not index event (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [system.process.cgroup.memory.kmem.limit.bytes]","caused_by":{"type":"json_parse_exception","reason":"Numeric value (18446744073709551615) out of range of long (-9223372036854775808 - 9223372036854775807)\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@5996f58a; line: 1, column: 1103]"}}

None of the other cgroup processes are indicated as failing, but none of them show up in my index either. I have deleted my index and repushed the template even though it looked the same as the one that was registered on Elasticsearch.

Any ideas on how to fix these two issues - the first was found because I was troubleshooting the second.
Thanks.


(Andrew Kroh) #2

Instead of using the binary directly, try using the wrapper script that configures the paths to be consistent with how the service works.

sudo metricbeat.sh -e -configtest

For your filter you can try something like this. Notice I dropped the "system.process" from the key name in the filter.

metricbeat.modules:
- module: system
  metricsets:
  - cpu
  - load
  - core
  - diskio
  - filesystem
  - fsstat
  - memory
  - network
  - process
  - socket
  enabled: true
  period: 10s 
  processes: ['.*']
  cgroups: true
  filters:
  - drop_event:
     when:
       equals:
         cgroup.memory.kmem.limit.bytes: 18446744073709551615

Filter example: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-filesystem.html


(Cliff Sobchuk) #3

Thanks Andrew. The config test passes. I missed the '-' the first time I used the drop_event syntax.


(Cliff Sobchuk) #4

Hi Andrew, So the config test passed, but I don't get any cgroup information in the metricbeat output. I checked using the command
lscgroup
that there is output (example):
lscgroup
cpuset:/
cpuset:/lxc
cpuset:/lxc/osa-control_ceilometer_collector_container-719f16a1
cpuset:/lxc/osa-control_aodh_container-645e5c0c
cpuset:/lxc/osa-control_ceilometer_api_container-7dfbecb6
cpuset:/lxc/osa-control_horizon_container-80a7dc5d
cpuset:/lxc/osa-control_heat_engine_container-a61b2f7c
cpuset:/lxc/osa-control_heat_apis_container-1fe1c0ef
cpuset:/lxc/osa-control_neutron_agents_container-000a08c9
cpuset:/lxc/osa-control_nova_scheduler_container-c1c6c12c
cpuset:/lxc/osa-control_nova_api_os_compute_container-c1fcd940
cpuset:/lxc/osa-control_nova_api_metadata_container-82068c71
cpuset:/lxc/osa-control_nova_console_container-168dede3
cpuset:/lxc/osa-control_nova_conductor_container-91f4d773
cpuset:/lxc/osa-control_nova_cert_container-c51672b8
cpuset:/lxc/osa-control_glance_container-fc73174a
cpuset:/lxc/osa-control_keystone_container-b718a99a
cpuset:/lxc/osa-control_rsyslog_container-6c1d47d1
cpuset:/lxc/osa-control_utility_container-e8759302
cpuset:/lxc/osa-control_rabbit_mq_container-8a484a0f
cpuset:/lxc/osa-control_galera_container-5651fd42
cpuset:/lxc/osa-control_repo_container-c9de8542
cpuset:/lxc/osa-control_memcached_container-e5e45f70
cpuset:/lxc/osa-control_cinder_api_container-6af29d50
cpuset:/lxc/osa-control_neutron_server_container-21a1c72a
cpuset:/lxc/osa-control_cinder_scheduler_container-51a3afb5

as well as:
cgget -g cpuacct /lxc/osa-control_ceilometer_collector_container-719f16a1
/lxc/osa-control_ceilometer_collector_container-719f16a1:
cpuacct.stat: user 353616
system 214881
cpuacct.usage_percpu: 3294665505975 3270168363357
cpuacct.usage: 6564833869332
Yet there is no successful non-zero beat information (example):
2017-05-15T23:15:40Z INFO Non-zero metrics in the last 30s: fetches.system-load.success=3 fetches.system-network.events=192 fetches.system-network.success=3 fetches.system-cpu.success=3 fetches.system-diskio.success=3 fetches.system-socket.success=3 fetches.system-process.events=441 libbeat.es.publish.read_bytes=10192 fetches.system-core.events=6 fetches.system-process.success=3 libbeat.es.publish.write_bytes=495758 libbeat.es.published_and_acked_events=913 fetches.system-diskio.events=12 libbeat.publisher.messages_in_worker_queues=913 libbeat.publisher.published_events=913 fetches.system-filesystem.success=3 fetches.system-core.success=3 fetches.system-fsstat.success=3 fetches.system-socket.events=202 libbeat.es.call_count.PublishEvents=21 fetches.system-filesystem.events=48 fetches.system-fsstat.events=3 fetches.system-memory.success=3 fetches.system-memory.events=3 fetches.system-cpu.events=3 fetches.system-load.events=3

The filter should only apply to that one entity, but it was the only one that was being reported previously and I thought it was because the failure prevented other process reports.


(Cliff Sobchuk) #5

Correct the typo 'no successful non-zero beat information for the cgroups.'


(Andrew Kroh) #6

You could enable debug to get some more insight into what's happening. I would enable only the process metricset to limit the amount of output.

sudo metricbeat.sh -e -d "*"


(Cliff Sobchuk) #7

Thanks Andrew,

I don't see anything other than the indication that CGroups are enabled and experimental. Nothing in the logs indicates that they are actually being queried.

 sudo metricbeat.sh -e -d "*" -c /etc/metricbeat/metricbeat.yml
2017/05/16 04:01:54.813311 beat.go:267: INFO Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2017/05/16 04:01:54.813553 beat.go:177: INFO Setup Beat: metricbeat; Version: 5.2.2
2017/05/16 04:01:54.813431 logp.go:219: INFO Metrics logging every 30s
2017/05/16 04:01:54.813945 output.go:167: INFO Loading template enabled. Reading template file: /etc/metricbeat/metricbeat.template.json
2017/05/16 04:01:54.822209 output.go:178: INFO Loading template enabled for Elasticsearch 2.x. Reading template file: /etc/metricbeat/metricbeat.template-es2x.json
2017/05/16 04:01:54.828623 client.go:120: INFO Elasticsearch url: http://192.168.0.40:9200
2017/05/16 04:01:54.828803 outputs.go:106: INFO Activated elasticsearch as output plugin.
2017/05/16 04:01:54.829025 publish.go:291: INFO Publisher name: osa-control
2017/05/16 04:01:54.830891 async.go:63: INFO Flush Interval set to: 1s
2017/05/16 04:01:54.831039 async.go:64: INFO Max Bulk Size set to: 50
2017/05/16 04:01:54.831246 metricbeat.go:25: INFO Register [ModuleFactory:[docker, mongodb, mysql, postgresql, system], MetricSetFactory:[apache/status, couchbase/bucket, couchbase/cluster, couchbase/node, docker/container, docker/cpu, docker/diskio, docker/info, docker/memory, docker/network, haproxy/info, haproxy/stat, kafka/consumergroup, kafka/partition, mongodb/status, mysql/status, nginx/stubstatus, postgresql/activity, postgresql/bgwriter, postgresql/database, prometheus/collector, prometheus/stats, redis/info, redis/keyspace, system/core, system/cpu, system/diskio, system/filesystem, system/fsstat, system/load, system/memory, system/network, system/process, system/socket, zookeeper/mntr]]
2017/05/16 04:01:54.831584 process.go:66: WARN EXPERIMENTAL: Cgroup is enabled for the system.process MetricSet.
2017/05/16 04:01:54.833221 beat.go:207: INFO metricbeat start running.
2017/05/16 04:01:55.751126 client.go:652: INFO Connected to Elasticsearch version 5.2.0
2017/05/16 04:01:55.751265 output.go:214: INFO Trying to load template for client: http://192.168.0.40:9200
2017/05/16 04:01:55.753133 output.go:221: INFO Existing template will be overwritten, as overwrite is enabled.
2017/05/16 04:01:55.789788 client.go:582: INFO Elasticsearch template with name 'metricbeat' loaded
2017/05/16 04:02:24.813965 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.es.publish.write_bytes=306439 libbeat.publisher.messages_in_worker_queues=450 fetches.system-process.events=450 libbeat.es.call_count.PublishEvents=9 libbeat.es.published_and_acked_events=450 libbeat.es.publish.read_bytes=5181 libbeat.publisher.published_events=450 fetches.system-process.success=3
2017/05/16 04:02:54.813961 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.es.publish.read_bytes=4556 libbeat.publisher.messages_in_worker_queues=450 fetches.system-process.events=450 libbeat.es.call_count.PublishEvents=9 fetches.system-process.success=3 libbeat.publisher.published_events=450 libbeat.es.publish.write_bytes=274096 libbeat.es.published_and_acked_events=450
2017/05/16 04:03:24.814021 logp.go:230: INFO Non-zero metrics in the last 30s: fetches.system-process.success=3 libbeat.es.published_and_acked_events=450 fetches.system-process.events=450 libbeat.es.publish.read_bytes=4550 libbeat.es.publish.write_bytes=274095 libbeat.es.call_count.PublishEvents=9 libbeat.publisher.published_events=450 libbeat.publisher.messages_in_worker_queues=450
2017/05/16 04:03:54.814023 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=450 fetches.system-process.events=450 libbeat.es.call_count.PublishEvents=9 libbeat.es.published_and_acked_events=450 libbeat.publisher.messages_in_worker_queues=450 libbeat.es.publish.write_bytes=274097 libbeat.es.publish.read_bytes=4550 fetches.system-process.success=3
2017/05/16 04:04:24.814043 logp.go:230: INFO Non-zero metrics in the last 30s: fetches.system-process.success=3 fetches.system-process.events=450 libbeat.publisher.messages_in_worker_queues=450 libbeat.es.publish.write_bytes=274096 libbeat.es.published_and_acked_events=450 libbeat.es.publish.read_bytes=4567 libbeat.es.call_count.PublishEvents=9 libbeat.publisher.published_events=450

(Andrew Kroh) #8

It seems as if debug didn't enable for that test run. Can you try adding -E logging.level=debug to flags.


(Cliff Sobchuk) #9

Thanks Andrew, that indeed provided debug information. I don't see any cgroup values (assuming that they should be specified as the fields in the system.process fields information.
There is only one line in the output that contains cgroup information and it is a mapping that is 30,331 characters long. I have parsed this in to what I believe are the correct individual mappings of 354 lines. Do you want me to place the output (in both forms single line and the parsed line output) in to pastebin or somewhere else? I don't see any values in this statement - only the mappings. ex:
2017/05/16 13:10:54.067271 client.go:667:
DBG PUT http://192.168.0.40:9200/_template/metricbeat
map[mappings:map[default:map[_all:map[norms:false] _meta:map[version:5.2.2]
dynamic_templates:[map[strings_as_keyword:map[mapping:map[ignore_above:1024 type:keyword]
match_mapping_type:string]]]
properties:map[beat:map[properties:map[hostname:map[type:keyword ignore_above:1024] name:map[ignore_above:1024 type:keyword] version:map[ignore_above:1024 type:keyword]]]
haproxy:map[properties:map[stat:map[properties:map[compressor:map[properties:map[bypassed:map[properties:map[bytes:map[type:long]]]
in:map[properties:map[bytes:map[type:long]]] out:map[properties:map[bytes:map[type:long]]]
response:map[properties:map[bytes:map[type:long]]]]]
request:map[properties:map[queued:map[properties:map[current:map[type:long]
max:map[type:long]]]
rate:map[properties:map[max:map[type:long]
value:map[type:long]]]

The line directly after the cgroup map lines is a system process capture:
2017/05/16 13:10:54.067326 client.go:184: DBG Publish: {
"@timestamp": "2017-05-16T13:10:53.084Z",
"beat": {
"hostname": "osa-control",
"name": "osa-control",
"version": "5.2.2"
},
"metricset": {
"module": "system",
"name": "process",
"rtt": 747558
},
"system": {
"process": {
"cpu": {
"start_time": "2017-05-11T14:10:11.000Z",
"total": {
"pct": 0.000000
}
},
"fd": {
"limit": {
"hard": 4096,
"soft": 1024
},
"open": 0
},
"memory": {
"rss": {
"bytes": 0,
"pct": 0.000000
},
"share": 0,
"size": 0
},
"name": "kworker/0:1",
"pgid": 0,
"pid": 5509,
"ppid": 2,
"state": "sleeping",
"username": "root"
}
},
"type": "metricsets"
}

Here is the output from top with CGROUP field displayed:
top - 14:14:58 up 82 days, 18:14, 2 users, load average: 1.14, 1.27, 1.40
Tasks: 598 total, 1 running, 597 sleeping, 0 stopped, 0 zombie
%Cpu(s): 29.2 us, 6.9 sy, 0.0 ni, 62.2 id, 1.2 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem: 16047352 total, 15653396 used, 393956 free, 168888 buffers
KiB Swap: 0 total, 0 used, 0 free. 2324492 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                   CGROUPS
28273 cinder    20   0  370448  98540   4368 S   3.6  0.6  77:50.81 apache2                                                   11:hugetlb:/lxc/osa-control_keystone_container-b718a99a,+
23752 cinder    20   0  191000  91944   2552 S   3.3  0.6   1951:34 neutron-server                                            11:hugetlb:/lxc/osa-control_neutron_server_container-21a+
18801 cinder    20   0  236096 136844   2300 S   3.0  0.9   4417:01 nova-conductor                                            11:hugetlb:/lxc/osa-control_nova_conductor_container-91f+
23754 cinder    20   0  177912  78580   2396 S   3.0  0.5   1898:05 neutron-server                                            11:hugetlb:/lxc/osa-control_neutron_server_container-21a+
27621 sshd      20   0 1861260 681328   2096 S   3.0  4.2   2531:02 beam.smp                                                  11:hugetlb:/lxc/osa-control_rabbit_mq_container-8a484a0f+
15428 cinder    20   0  302728 168344   2500 S   2.3  1.0   1430:50 nova-api-os-com                                           11:hugetlb:/lxc/osa-control_nova_api_os_compute_containe+
28272 cinder    20   0  370448  98508   4364 S   2.3  0.6  78:02.62 apache2                                                   11:hugetlb:/lxc/osa-control_keystone_container-b718a99a,+
28275 cinder    20   0  370448  98508   4364 S   2.3  0.6  78:17.33 apache2                                                   11:hugetlb:/lxc/osa-control_keystone_container-b718a99a,+
28277 cinder    20   0  371728 100556   4368 S   2.3  0.6  32:42.06 apache2                                                   11:hugetlb:/lxc/osa-control_keystone_container-b718a99a,+
 1424 cinder    20   0  190772  73924   2476 S   2.0  0.5   1173:24 cinder-schedule                                           11:hugetlb:/lxc/osa-control_cinder_scheduler_container-5+
18097 cinder    20   0  202832 103516   2368 S   2.0  0.6   1351:18 nova-scheduler                                            11:hugetlb:/lxc/osa-control_nova_scheduler_container-c1c+
18353 cinder    20   0  192800  74960   2580 S   2.0  0.5   2482:25 cinder-volume                                             -
22658 cinder    20   0  201348  99140   2152 S   2.0  0.6   1287:54 nova-api-metada                                           11:hugetlb:/lxc/osa-control_nova_api_metadata_container-+
30334 cinder    20   0  138124  54352   3320 S   2.0  0.3   1116:24 neutron-l3-agen                                           11:hugetlb:/lxc/osa-control_neutron_agents_container-000+
 2325 cinder    20   0  208740 109468   2280 S   1.6  0.7   1609:06 nova-consoleaut                                           11:hugetlb:/lxc/osa-control_nova_console_container-168de+
 3655 ubuntu    20   0   25788   2372   1248 R   1.6  0.0   0:05.69 top                                                       2:name=systemd:/user/1000.user/1850.session
 7268 cinder    20   0  165160  81376   3312 S   1.6  0.5   1913:04 neutron-linuxbr                                           11:hugetlb:/lxc/osa-control_neutron_agents_container-000+

Am I wrong to expect that cgroup process information is to show up under fields like:
tsystem.process.cgroup.blkio.id
This field is present in your elasticsearch mapping but not in any documents in the search results. You may still be able to visualize or search on it.
tsystem.process.cgroup.blkio.path
#system.process.cgroup.blkio.total.bytes
#system.process.cgroup.blkio.total.ios
#system.process.cgroup.cpu.cfs.p

The above search was * in Kibana.


(Andrew Kroh) #10

Is this output with the filter in place to drop events? Could you remove that temporarily. I don't know if there is any logging in place for filtered events.

That's correct. Here's an event for an example.

{
  "@timestamp": "2017-05-16T14:40:03.099Z",
  "beat": {
    "hostname": "audit-test",
    "name": "audit-test",
    "version": "6.0.0-alpha1"
  },
  "metricset": {
    "module": "system",
    "name": "process",
    "rtt": 103478
  },
  "system": {
    "process": {
      "cgroup": {
        "blkio": {
          "id": "user.slice",
          "path": "/user.slice",
          "total": {
            "bytes": 1307828224,
            "ios": 13770
          }
        },
        "cpu": {
          "cfs": {
            "period": {
              "us": 100000
            },
            "quota": {
              "us": 0
            },
            "shares": 1024
          },
          "id": "user.slice",
          "path": "/user.slice",
          "rt": {
            "period": {
              "us": 0
            },
            "runtime": {
              "us": 0
            }
          },
          "stats": {
            "periods": 0,
            "throttled": {
              "ns": 0,
              "periods": 0
            }
          }
        },
        "cpuacct": {
          "id": "user.slice",
          "path": "/user.slice",
          "percpu": {
            "1": 64860398403
          },
          "stats": {
            "system": {
              "ns": 20000000000
            },
            "user": {
              "ns": 43100000000
            }
          },
          "total": {
            "ns": 64860398403
          }
        },
        "id": "user.slice",
        "memory": {
          "id": "user.slice",
          "kmem": {
            "failures": 0,
            "limit": {
              "bytes": 9223372036854771712
            },
            "usage": {
              "bytes": 59850752,
              "max": {
                "bytes": 371179520
              }
            }
          },
          "kmem_tcp": {
            "failures": 0,
            "limit": {
              "bytes": 9223372036854771712
            },
            "usage": {
              "bytes": 59850752,
              "max": {
                "bytes": 371179520
              }
            }
          },
          "mem": {
            "failures": 0,
            "limit": {
              "bytes": 9223372036854771712
            },
            "usage": {
              "bytes": 59850752,
              "max": {
                "bytes": 371179520
              }
            }
          },
          "memsw": {
            "failures": 0,
            "limit": {
              "bytes": 9223372036854771712
            },
            "usage": {
              "bytes": 59850752,
              "max": {
                "bytes": 371179520
              }
            }
          },
          "path": "/user.slice",
          "stats": {
            "active_anon": {
              "bytes": 12898304
            },
            "active_file": {
              "bytes": 30760960
            },
            "cache": {
              "bytes": 39411712
            },
            "hierarchical_memory_limit": {
              "bytes": 9223372036854771712
            },
            "hierarchical_memsw_limit": {
              "bytes": 0
            },
            "inactive_anon": {
              "bytes": 0
            },
            "inactive_file": {
              "bytes": 8646656
            },
            "major_page_faults": 1859,
            "mapped_file": {
              "bytes": 17395712
            },
            "page_faults": 2398196,
            "pages_in": 1877596,
            "pages_out": 1864826,
            "rss": {
              "bytes": 12894208
            },
            "rss_huge": {
              "bytes": 0
            },
            "swap": {
              "bytes": 0
            },
            "unevictable": {
              "bytes": 0
            }
          }
        },
        "path": "/user.slice"
      },
      "cmdline": "/usr/share/metricbeat/bin/metricbeat -e -d * -c metricbeat.process.yml -E logging.level=debug -strict.perms=false",
      "cpu": {
        "start_time": "2017-05-16T14:39:57.000Z",
        "total": {
          "pct": 0.080100
        }
      },
      "cwd": "/home/andrew_kroh",
      "fd": {
        "limit": {
          "hard": 65536,
          "soft": 1024
        },
        "open": 4
      },
      "memory": {
        "rss": {
          "bytes": 25128960,
          "pct": 0.040900
        },
        "share": 17137664,
        "size": 230318080
      },
      "name": "metricbeat",
      "pgid": 8402,
      "pid": 8403,
      "ppid": 8402,
      "state": "sleeping",
      "username": "root"
    }
  },
  "type": "metricsets"
}

(Cliff Sobchuk) #11

Hi Andrew.
Yes the filter with the drop_event was in place. I tried to use drop_fields instead and then I just removed the filter completely as well.

With drop_fields I still get no cgroup output. config file:

    - module: system
      metricsets:
    #    - cpu
    #    - load
    #    - core
    #    - diskio
    #    - filesystem
    #    - fsstat
    #    - memory
    #    - network
        - process
    #    - socket
      enabled: true
      period: 10s
      processes: ['.*']
      cgroups: true
      filters:
       - drop_fields:
           when:
             equals:
               cgroup.memory.kmem.limit.bytes: 18446744073709551615
           fields: ["cgroup.memory.kmem.limit.bytes"]

Without the filter I get cgroup output in the debug but I still didn't receive anything on Elasticsearch:

2017/05/16 14:50:09.466361 client.go:184: DBG  Publish: {
      "@timestamp": "2017-05-16T14:50:08.842Z",
      "beat": {
        "hostname": "osa-control",
        "name": "osa-control",
        "version": "5.2.2"
      },
      "metricset": {
        "module": "system",
        "name": "process",
        "rtt": 587541
      },
      "system": {
        "process": {
          "cgroup": {
            "blkio": {
              "id": "osa-control_repo_container-c9de8542",
              "path": "/lxc/osa-control_repo_container-c9de8542",
              "total": {
                "bytes": 2218360832,
                "ios": 76771
              }
            },
            "cpu": {
              "cfs": {
                "period": {
                  "us": 100000
                },
                "quota": {
                  "us": 0
                },
                "shares": 1024
              },
              "id": "osa-control_repo_container-c9de8542",
              "path": "/lxc/osa-control_repo_container-c9de8542",
              "rt": {
                "period": {
                  "us": 0
                },
                "runtime": {
                  "us": 0
                }
              },
              "stats": {
                "periods": 0,
                "throttled": {
                  "ns": 0,
                  "periods": 0
                }
              }
            },
            "cpuacct": {
              "id": "osa-control_repo_container-c9de8542",
              "path": "/lxc/osa-control_repo_container-c9de8542",
              "percpu": {
                "1": 672864347887,
                "2": 786469378761
              },
              "stats": {
                "system": {
                  "ns": 327340000000
                },
                "user": {
                  "ns": 1166400000000
                }
              },
              "total": {
                "ns": 1459333726648
              }
            },
            "id": "osa-control_repo_container-c9de8542",
            "memory": {
              "id": "osa-control_repo_container-c9de8542",
              "kmem": {
                "failures": 0,
                "limit": {
                  "bytes": 18446744073709551615
                },
                "usage": {
                  "bytes": 52035584,
                  "max": {
                    "bytes": 3368857600
                  }
                }
              },
              "kmem_tcp": {
                "failures": 0,
                "limit": {
                  "bytes": 18446744073709551615
                },
                "usage": {
                  "bytes": 52035584,
                  "max": {
                    "bytes": 3368857600
                  }
                }
              },
              "mem": {
                "failures": 0,
                "limit": {
                  "bytes": 18446744073709551615
                },
                "usage": {
                  "bytes": 52035584,
                  "max": {
                    "bytes": 3368857600
                  }
                }
              },
              "memsw": {
                "failures": 0,
                "limit": {
                  "bytes": 18446744073709551615
                },
                "usage": {
                  "bytes": 52035584,
                  "max": {
                    "bytes": 3368857600
                  }
                }
              },
              "path": "/lxc/osa-control_repo_container-c9de8542",
              "stats": {
                "active_anon": {
                  "bytes": 15376384
                },
                "active_file": {
                  "bytes": 21200896
                },
                "cache": {
                  "bytes": 36724736
                },
                "hierarchical_memory_limit": {
                  "bytes": 18446744073709551615
                },
                "hierarchical_memsw_limit": {
                  "bytes": 0
                },
                "inactive_anon": {
                  "bytes": 929792
                },
                "inactive_file": {
                  "bytes": 14528512
                },
                "major_page_faults": 6722,
                "mapped_file": {
                  "bytes": 479232
                },
                "page_faults": 61474112,
                "pages_in": 20083905,
                "pages_out": 20128944,
                "rss": {
                  "bytes": 15310848
                },
                "rss_huge": {
                  "bytes": 2097152
                },
                "swap": {
                  "bytes": 0
                },
                "unevictable": {
                  "bytes": 0
                }
              }
            },
            "path": "/lxc/osa-control_repo_container-c9de8542"
          },
          "cpu": {
            "start_time": "2017-02-22T20:58:08.000Z",
            "total": {
              "pct": 0.000000
            }
          },
          "fd": {
            "limit": {
              "hard": 4096,
              "soft": 1024
            },
            "open": 15
          },
          "memory": {
            "rss": {
              "bytes": 2134016,
              "pct": 0.000100
            },
            "share": 65536,
            "size": 250699776
          },
          "name": "nginx",
          "pgid": 376,
          "pid": 376,
          "ppid": 30394,
          "state": "sleeping",
          "username": "root"
        }
      },
      "type": "metricsets"
    }

(Andrew Kroh) #12

Can you try with this config that drops the "limit" fields that have values >= 2^63. If that works then I think we need to build some kind of check into the cgroup feature to do some magic on the limit values.

metricbeat.modules:
- module: system
  metricsets: [process]
  filters:
  - drop_fields:
      fields: ['cgroup.memory.kmem.limit.bytes']
      when.range:
        cgroup.memory.kmem.limit.bytes.gte: 9223372036854775808
  - drop_fields:
      fields: ['cgroup.memory.kmem_tcp.limit.bytes']
      when.range:
        cgroup.memory.kmem_tcp.limit.bytes.gte: 9223372036854775808
  - drop_fields:
      fields: ['cgroup.memory.mem.limit.bytes']
      when.range:
        cgroup.memory.mem.limit.bytes.gte: 9223372036854775808
  - drop_fields:
      fields: ['cgroup.memory.memsw.limit.bytes']
      when.range:
        cgroup.memory.memsw.limit.bytes.gte: 9223372036854775808
  - drop_fields:
      fields: ['cgroup.memory.stats.hierarchical_memory_limit.bytes']
      when.range:
        cgroup.memory.stats.hierarchical_memory_limit.bytes.gte: 9223372036854775808

(Cliff Sobchuk) #13

Cool. (I copied and pasted that section and replaced my filter section - right after cgroup: true. That seems to have worked. Both in the logs as well as now I see them in Kibana (below the log sample). Yay! :smile:

2017/05/16 18:50:45.349675 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-16T18:50:44.455Z",
  "beat": {
    "hostname": "osa-control",
    "name": "osa-control",
    "version": "5.2.2"
  },
  "metricset": {
    "module": "system",
    "name": "process",
    "rtt": 684005
  },
  "system": {
    "process": {
      "cgroup": {
        "blkio": {
          "id": "osa-control_repo_container-c9de8542",
          "path": "/lxc/osa-control_repo_container-c9de8542",
          "total": {
            "bytes": 2218430464,
            "ios": 76788
          }
        },
        "cpu": {
          "cfs": {
            "period": {
              "us": 100000
            },
            "quota": {
              "us": 0
            },
            "shares": 1024
          },
          "id": "osa-control_repo_container-c9de8542",
          "path": "/lxc/osa-control_repo_container-c9de8542",
          "rt": {
            "period": {
              "us": 0
            },
            "runtime": {
              "us": 0
            }
          },
          "stats": {
            "periods": 0,
            "throttled": {
              "ns": 0,
              "periods": 0
            }
          }
        },
        "cpuacct": {
          "id": "osa-control_repo_container-c9de8542",
          "path": "/lxc/osa-control_repo_container-c9de8542",
          "percpu": {
            "1": 673077988428,
            "2": 786703655244
          },
          "stats": {
            "system": {
              "ns": 327580000000
            },
            "user": {
              "ns": 1166580000000
            }
          },
          "total": {
            "ns": 1459781643672
          }
        },
        "id": "osa-control_repo_container-c9de8542",
        "memory": {
          "id": "osa-control_repo_container-c9de8542",
          "kmem": {
            "failures": 0,
            "limit": {},
            "usage": {
              "bytes": 50839552,
              "max": {
                "bytes": 3368857600
              }
            }
          },
          "kmem_tcp": {
            "failures": 0,
            "limit": {},
            "usage": {
              "bytes": 50839552,
              "max": {
                "bytes": 3368857600
              }
            }
          },
          "mem": {
            "failures": 0,
            "limit": {},
            "usage": {
              "bytes": 50839552,
              "max": {
                "bytes": 3368857600
              }
            }
          },
          "memsw": {
            "failures": 0,
            "limit": {},
            "usage": {
              "bytes": 50839552,
              "max": {
                "bytes": 3368857600
              }
            }
          },
          "path": "/lxc/osa-control_repo_container-c9de8542",
          "stats": {
            "active_anon": {
              "bytes": 15384576
            },
            "active_file": {
              "bytes": 19501056
            },
            "cache": {
              "bytes": 35520512
            },
            "hierarchical_memory_limit": {},
            "hierarchical_memsw_limit": {
              "bytes": 0
            },
            "inactive_anon": {
              "bytes": 929792
            },
            "inactive_file": {
              "bytes": 15024128
            },
            "major_page_faults": 6722,
            "mapped_file": {
              "bytes": 479232
            },
            "page_faults": 61555526,
            "pages_in": 20094429,
            "pages_out": 20139760,
            "rss": {
              "bytes": 15319040
            },
            "rss_huge": {
              "bytes": 2097152
            },
            "swap": {
              "bytes": 0
            },
            "unevictable": {
              "bytes": 0
            }
          }
        },
        "path": "/lxc/osa-control_repo_container-c9de8542"
      },
      "cpu": {
        "start_time": "2017-02-22T20:58:08.000Z",
        "total": {
          "pct": 0.000000
        }
      },
      "fd": {
        "limit": {
          "hard": 4096,
          "soft": 1024
        },
        "open": 15
      },
      "memory": {
        "rss": {
          "bytes": 2134016,
          "pct": 0.000100
        },
        "share": 65536,
        "size": 250699776
      },
      "name": "nginx",
      "pgid": 376,
      "pid": 376,
      "ppid": 30394,
      "state": "sleeping",
      "username": "root"
    }
  },
  "type": "metricsets"
}

And a sample from Kibana:
t system.process.cgroup.blkio.id osa-control_cinder_scheduler_container-51a3afb5
t system.process.cgroup.blkio.path /lxc/osa-control_cinder_scheduler_container-51a3afb5
# system.process.cgroup.blkio.total.bytes 1,081,933,824
# system.process.cgroup.blkio.total.ios 39,569
# system.process.cgroup.cpu.cfs.period.us 100,000
# system.process.cgroup.cpu.cfs.quota.us 0
# system.process.cgroup.cpu.cfs.shares 1,024
t system.process.cgroup.cpu.id osa-control_cinder_scheduler_container-51a3afb5
t system.process.cgroup.cpu.path /lxc/osa-control_cinder_scheduler_container-51a3afb5


(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.