Packetbeat HighCPU monitoring memcache


(Tim Arp) #1

Hi,
I have been monitoring our memcached servers with packetbeat. This was noticed after deploying packetbeat 5.4.0, it may have been happening before, but I'm not 100% sure. I will consistently get packetbeat taking all of the cpu on the server. This impacts the performance of memcache. For now, we have shutoff packetbeat.

I have 4 instances of memcache running on the server. Top looks like this before packetbeat.
top - 07:51:32 up 100 days, 6:35, 1 user, load average: 0.19, 0.13, 0.05
Tasks: 164 total, 1 running, 163 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9%us, 3.5%sy, 0.0%ni, 94.3%id, 0.1%wa, 0.0%hi, 1.1%si, 0.0%st
Mem: 8054356k total, 7672104k used, 382252k free, 65656k buffers
Swap: 8388604k total, 20220k used, 8368384k free, 4616300k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20020 memcache 20 0 1069m 778m 712 S 6.3 9.9 1832:06 memcached
20052 memcache 20 0 1067m 783m 712 S 6.0 10.0 1839:16 memcached
6470 memcache 20 0 558m 190m 708 S 3.3 2.4 2187:18 memcached
13 root 20 0 0 0 0 S 0.3 0.0 29:47.84 ksoftirqd/2
1419 root 20 0 63608 3716 3300 S 0.3 0.0 130:39.73 vmtoolsd
6964 root 20 0 366m 8524 5308 S 0.3 0.1 11:39.54 pxp-agent

Starting up packetbeat it now looks like this.
top - 07:52:45 up 100 days, 6:36, 1 user, load average: 0.32, 0.19, 0.08
Tasks: 166 total, 1 running, 165 sleeping, 0 stopped, 0 zombie
Cpu(s): 24.3%us, 4.9%sy, 0.0%ni, 68.7%id, 0.1%wa, 0.0%hi, 2.0%si, 0.0%st
Mem: 8054356k total, 7778452k used, 275904k free, 65912k buffers
Swap: 8388604k total, 20220k used, 8368384k free, 4616040k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20913 root 20 0 457m 110m 10m S 77.1 1.4 0:37.20 packetbeat
20020 memcache 20 0 1069m 778m 712 S 9.6 9.9 1832:12 memcached
20052 memcache 20 0 1067m 783m 712 S 7.3 10.0 1839:21 memcached
6470 memcache 20 0 558m 190m 708 S 1.0 2.4 2187:20 memcached
52568 root 20 0 4374m 144m 15m S 0.3 1.8 169:59.43 java
1 root 20 0 30424 1704 1424 S 0.0 0.0 0:09.85 init

The packetbeat.yml is configured like this.

packetbeat.interfaces.device: any
packetbeat.flows:
timeout: 30s
period: 30s
packetbeat.protocols.memcache:
ports: [11215, 11216, 11217, 11223]

output.elasticsearch:
hosts: ['ingest-logcentral.dx.deere.com:9200']


(Tim Arp) #2

packetbeat log part 1
2017-05-23T07:51:51-05:00 INFO Home path: [/usr/share/packetbeat] Config path: [/etc/packetbeat] Data path: [/var/lib/packetbeat] Logs path: [/var/log/packetbeat]
2017-05-23T07:51:51-05:00 INFO Setup Beat: packetbeat; Version: 5.4.0
2017-05-23T07:51:51-05:00 INFO Loading template enabled. Reading template file: /etc/packetbeat/packetbeat.template.json
2017-05-23T07:51:51-05:00 INFO Loading template enabled for Elasticsearch 2.x. Reading template file: /etc/packetbeat/packetbeat.template-es2x.json
2017-05-23T07:51:51-05:00 INFO Loading template enabled for Elasticsearch 6.x. Reading template file: /etc/packetbeat/packetbeat.template-es6x.json
2017-05-23T07:51:51-05:00 INFO Elasticsearch url: http://ingest-logcentral.dx.deere.com:9200
2017-05-23T07:51:51-05:00 INFO Activated elasticsearch as output plugin.
2017-05-23T07:51:51-05:00 INFO Publisher name: ldpn90mc1
2017-05-23T07:51:51-05:00 INFO Flush Interval set to: 1s
2017-05-23T07:51:51-05:00 INFO Max Bulk Size set to: 50
2017-05-23T07:51:51-05:00 INFO Process matching disabled
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: mysql
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: nfs
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: pgsql
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: redis
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: thrift
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: cassandra
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: memcache
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: http
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: mongodb
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: amqp
2017-05-23T07:51:51-05:00 INFO registered protocol plugin: dns
2017-05-23T07:51:51-05:00 INFO packetbeat start running.
2017-05-23T07:51:51-05:00 INFO Connected to Elasticsearch version 5.4.0
2017-05-23T07:51:51-05:00 INFO Trying to load template for client: http://ingest-logcentral.dx.deere.com:9200
2017-05-23T07:51:51-05:00 INFO Template already exists and will not be overwritten.
2017-05-23T07:52:05-05:00 WARN Missing response for non-quiet request
2017-05-23T07:52:18-05:00 WARN Missing response for non-quiet request
2017-05-23T07:52:21-05:00 INFO Non-zero metrics in the last 30s: libbeat.es.call_count.PublishEvents=466 libbeat.es.publish.read_bytes=234003 libbeat.es.publish.write_bytes=18094722 libbeat.es.published_and_acked_events=22726 libbeat.publisher.messages_in_worker_queues=23826 libbeat.publisher.published_events=23827 memcache.unfinished_transactions=4086 memcache.unmatched_requests=2 memcache.unmatched_responses=2707 tcp.dropped_because_of_gaps=5112
2017-05-23T07:52:51-05:00 INFO Non-zero metrics in the last 30s: libbeat.es.call_count.PublishEvents=513 libbeat.es.publish.read_bytes=257996 libbeat.es.publish.write_bytes=17745335 libbeat.es.published_and_acked_events=25093 libbeat.publisher.messages_in_worker_queues=15581 libbeat.publisher.published_events=25093 memcache.unfinished_transactions=4785 memcache.unmatched_responses=2860 tcp.dropped_because_of_gaps=5301


(Tim Arp) #3

packetbeat log part 2

2017-05-23T07:52:53-05:00 WARN Missing response for non-quiet request
2017-05-23T07:52:56-05:00 ERR ParseMemcache(TCP) exception. Recovering, but please report this: runtime error: slice bounds out of range.
2017-05-23T07:52:56-05:00 ERR Stacktrace: goroutine 49 [running]:
runtime/debug.Stack(0xbcf45d, 0x2b, 0xc42003f5f0)
/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/elastic/beats/libbeat/logp.Recover(0xbc388b, 0x1c)
/go/src/github.com/elastic/beats/libbeat/logp/log.go:145 +0x138
panic(0xad3d00, 0xc4200100b0)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/elastic/beats/libbeat/common/streambuf.(*Buffer).Collect(0xc424665730, 0xffffffffffffece3, 0xc42003f7c8, 0x1501, 0xc423da4000, 0x1321c86480, 0xc42003f7c0)
/go/src/github.com/elastic/beats/libbeat/common/streambuf/streambuf.go:335 +0xb0
github.com/elastic/beats/packetbeat/protos/memcache.parseDataBinary(0xc424665780, 0xc424665730, 0x10, 0xa9fd20, 0xc42003f858)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/binary.go:363 +0x67
github.com/elastic/beats/packetbeat/protos/memcache.(*parser).dispatch(0xc424665780, 0xc4224c2104, 0xc424665730, 0x95a, 0x11a6300, 0x7ff712409000)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/parse.go:102 +0x54
github.com/elastic/beats/packetbeat/protos/memcache.(*parser).parse(0xc424665780, 0xc424665730, 0xed0b627a8, 0x1a1fd208, 0x11a4d20, 0xc4224c2130, 0x1, 0x1)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/parse.go:78 +0x49
github.com/elastic/beats/packetbeat/protos/memcache.(*stream).parse(0xc424665730, 0xed0b627a8, 0xc41a1fd208, 0x11a4d20, 0x1, 0x0, 0xc41fed9ef7)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/plugin_tcp.go:434 +0x52
github.com/elastic/beats/packetbeat/protos/memcache.(*memcache).memcacheParseTCP(0xc4200a3980, 0xc4244c9860, 0xc4235c0300, 0xc4234bcdd8, 0x0, 0xc4244f2001)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/plugin_tcp.go:139 +0x2ae
github.com/elastic/beats/packetbeat/protos/memcache.(*memcache).Parse(0xc4200a3980, 0xc4235c0300, 0xc4234bcdd8, 0xc4200a3900, 0xac2a20, 0xc4244c9860, 0x0, 0x0)
/go/src/github.com/elastic/beats/packetbeat/protos/memcache/plugin_tcp.go:90 +0x137
github.com/elastic/beats/packetbeat/protos/tcp.(*TCPStream).addPacket(0xc42003fb60, 0xc4235c0300, 0xc42131b2a0)
/go/src/github.com/elastic/beats/packetbeat/protos/tcp/tcp.go:115 +0x176
github.com/elastic/beats/packetbeat/protos/tcp.(*TCP).Process(0xc4213bf170, 0xc42016f980, 0xc42131b2a0, 0xc4235c0300)
/go/src/github.com/elastic/beats/packetbeat/protos/tcp/tcp.go:208 +0x3a2
github.com/elastic/beats/packetbeat/decoder.(*Decoder).onTCP(0xc42131ad00, 0xc4235c0300)
/go/src/github.com/elastic/beats/packetbeat/decoder/decoder.go:317 +0xee
github.com/elastic/beats/packetbeat/decoder.(*Decoder).process(0xc42131ad00, 0xc4235c0300, 0x2c, 0x578, 0x1176700, 0xc42131ad00)
/go/src/github.com/elastic/beats/packetbeat/decoder/decoder.go:258 +0x222
github.com/elastic/beats/packetbeat/decoder.(*Decoder).OnPacket(0xc42131ad00, 0xc42335b844, 0x578, 0x578, 0xc42515ad20)
/go/src/github.com/elastic/beats/packetbeat/decoder/decoder.go:164 +0x343
github.com/elastic/beats/packetbeat/sniffer.(*SnifferSetup).Run(0xc42016df80, 0xc1f418, 0xc42112e040)
/go/src/github.com/elastic/beats/packetbeat/sniffer/sniffer.go:381 +0x493
github.com/elastic/beats/packetbeat/beater.(*packetbeat).Run.func2(0xc42112e040, 0xc4200acb40, 0xc4213f0060)
/go/src/github.com/elastic/beats/packetbeat/beater/packetbeat.go:155 +0x6a
created by github.com/elastic/beats/packetbeat/beater.(*packetbeat).Run
/go/src/github.com/elastic/beats/packetbeat/beater/packetbeat.go:159 +0x1b2

2017-05-23T07:53:10-05:00 INFO Packetbeat send stop signal
2017-05-23T07:53:10-05:00 INFO Input finish. Processed 619800 packets. Have a nice day!
2017-05-23T07:53:10-05:00 INFO flows worker loop stopped
2017-05-23T07:53:10-05:00 INFO Total non-zero values: libbeat.es.call_count.PublishEvents=1282 libbeat.es.publish.read_bytes=644829 libbeat.es.publish.write_bytes=46014178 libbeat.es.published_and_acked_events=62753 libbeat.publisher.messages_in_worker_queues=46682 libbeat.publisher.published_events=65404 memcache.unfinished_transactions=10901 memcache.unmatched_requests=3 memcache.unmatched_responses=6250 tcp.dropped_because_of_gaps=11566
2017-05-23T07:53:10-05:00 INFO Uptime: 1m19.569795953s
2017-05-23T07:53:10-05:00 INFO packetbeat stopped.


(ruflin) #4

It you receive a parse memcache exception it would be good to report this on Github as an issue. Please also add the exact memcache version you are using and your setup. If you could provide some pcap files captured from your traffic, that would be even better for debugging.


(Tim Arp) #5

I will try to create some documentation for the parse errors. Do you think the high cpu utilization is related to this? I didn't have a problem with 5.3 packetbeat.


(ruflin) #6

Could be, lets see as soon as we have all the info together in a Github issue.


(Tudor Golubenco) #7

Note that Packetbeat has to do sniffing and parsing of all Memcache requests/response. If you have a busy memcache instance, Packetbeat is going to have a hard time keeping up with the highly optimized C implementation of memcache, because it also has more work to do.

You have a couple of options:

  • Add a nice value to Packetbeat, or restrict it's CPU usage with Cgroups, or limit it to a single CPU (export GOMAXPROC=1). All this could result in Packetbeat missing parts of the traffic, but sampling should be random enough that you still get value from it.
  • Use the Metricbeat module to monitor memcache. This will only give high level metrics, though.
  • Consider monitoring via logs on the application side

HTH


(Tim Arp) #8

Hi Tudor,

That's a good point. What I really want or care about is the response time currently of the Memcache. I don't need to see every single interaction. For app interactions with memcache we are collecting metrics in AppDynamics from the application point of view so we have that covered. What I need to do is create a synthetic transaction and record that time. It may make more sense to look at heartbeat and do something with that instead.

Thanks,
Tim


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.