I'm running Metricbeat 6.2.2-1 on RHEL 7.5 with the following kernel version.
# uname -r -v
3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC 2018
I'm seeing the below errors in metricbeat log.
2018-09-07T22:28:01.829-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 20913
2018-09-08T06:08:01.933-0600 ERROR process/process.go:454 Error getting process details. pid=10808: error getting process mem for pid=10808: no such process
2018-09-08T09:53:31.785-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 20679
2018-09-08T16:48:41.857-0600 ERROR process/process.go:454 Error getting process details. pid=2508: error getting process file descriptor usage for pid=2508: open /proc/2508/fd: no such file or directory
2018-09-09T02:34:01.949-0600 ERROR process/process.go:454 Error getting process details. pid=32629: error getting process arguments for pid=32629: read /proc/32629/cmdline: no such process
I've enabled the system module with the following content. All other modules are disabled.
- module: system
metricsets:
- cpu # CPU usage
- filesystem # File system usage for each mountpoint
- fsstat # File system summary metrics
- load # CPU load averages
- memory # Memory usage
- network # Network IO
- process # Per process metrics
- process_summary # Process summary
- uptime # System Uptime
#- core # Per CPU core usage
- diskio # Disk IO
#- socket # Sockets and connection info (linux only)
enabled: true
period: 10s
processes: ['.*']
# Configure the metric types that are included by these metricsets.
cpu.metrics: ["percentages"] # The other available options are normalized_percentages and ticks.
core.metrics: ["percentages"] # The other available option is ticks.
Also metricbeat service on the server is started by the root user.
I'm not seeing any error today but on 6th and 7th Sept I was seeing the 'Unknown state <88> error' constantly as shown below.
2018-09-06T16:09:07.153-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 17179
2018-09-06T17:17:07.128-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 1130
2018-09-06T17:27:07.132-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 5740
2018-09-06T17:47:07.135-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 14727
2018-09-06T18:07:07.142-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 24452
2018-09-06T18:08:07.148-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 26307
2018-09-06T18:48:07.131-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 12059
2018-09-06T18:57:07.137-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 14872
2018-09-06T18:59:07.142-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 28334
2018-09-06T19:05:07.137-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 16320
2018-09-06T19:41:07.140-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 17912
2018-09-06T19:48:07.147-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 7750
2018-09-06T19:52:07.144-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 24827
2018-09-06T20:03:07.145-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 30794
2018-09-06T20:29:07.134-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 5402
2018-09-06T20:51:07.138-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 17802
2018-09-06T21:06:07.136-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 9009
2018-09-06T22:23:07.151-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 31005
2018-09-06T23:01:07.132-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 13664
2018-09-06T23:22:07.143-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 25080
2018-09-06T23:22:07.143-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 25081
2018-09-07T00:10:07.133-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 12164
2018-09-07T00:17:07.267-0600 ERROR process/process.go:454 Error getting process details. pid=1944: error getting process file descriptor usage for pid=1944: read /proc/1944/limits: no such process
2018-09-07T01:12:07.135-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 11312
2018-09-07T01:12:07.135-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 11313
2018-09-07T01:39:07.137-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 19529
2018-09-07T02:19:07.131-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 6076
2018-09-07T02:30:07.142-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 12342
2018-09-07T02:31:07.133-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 13946
2018-09-07T02:46:07.136-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 5073
2018-09-07T03:28:47.288-0600 ERROR process/process.go:454 Error getting process details. pid=5532: error getting process file descriptor usage for pid=5532: open /proc/5532/fd: no such file or directory
2018-09-07T03:30:07.134-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 7928
2018-09-07T03:37:07.143-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 30216
2018-09-07T04:16:07.136-0600 ERROR process_summary/process_summary.go:79 Unknown state <88> for process with pid 14694
We haven't seen yet that metricbeat tries to fetch information about dead processes. Maybe you can share your metricbeat configuration and we can try to take a look.
As I mentioned above I only have the system module enabled. I have shared the content of the module above.
Is there any other configuration you are looking for?
Sorry, I completely overlooked at the config. So as far as I could read, it is possible that you can see dead processes but it's very unlikely.
Even if metricbeat is "caching" the PID, as soon as it tries to get the state from the OS, it won't find the file in /proc/[pid]/status so it won't get a 88 status either. This is easy to check, just execute in a shell:
Wait until the process has finished after 30 seconds and repeat the cat program, you'll naturally get a cat: /proc/9999/status: No such file or directory.
There is not special magic within metricbeat to cache PID process between calls. It must be somethiing with the way that RHEL 7.5 deals with removing processes information from proc folder.
So the only next step I can think about now is to lower or raise the check period, just to check if the error messages are reduced. We are also considering to simply not log this message because it provides little valuable information anyways.
We have just merged and enhancement that will remove the error messages and, instead, count them as the value "dead" within the metricset event. Also, we will print the returned character instead of the ASCII code.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.