Packetbeat - “ERR Failed to read integer reply: Expected digit”

when analyzing pcap file for redis protocol with packetbeat, I get outputs as follow:

➜  packetbeat git:(master) ✗ ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-1_prot-7101_20161222110711_20161222110721.pcap -E packetbeat.protocols.redis.ports=7101 -t
...
2017/01/11 09:42:37.664290 protos.go:89: INFO registered protocol plugin: amqp
2017/01/11 09:42:37.664309 protos.go:89: INFO registered protocol plugin: http
2017/01/11 09:42:37.664315 protos.go:89: INFO registered protocol plugin: mysql
2017/01/11 09:42:37.664320 protos.go:89: INFO registered protocol plugin: redis
2017/01/11 09:42:37.665784 beat.go:207: INFO packetbeat start running.
2017/01/11 09:47:45.218365 redis_parse.go:306: ERR Failed to read integer reply: Expected digit
2017/01/11 09:42:38.430211 sniffer.go:384: INFO Input finish. Processed 40644 packets. Have a nice day!
2017/01/11 09:42:38.430657 util.go:48: INFO flows worker loop stopped
2017/01/11 09:42:38.430709 logp.go:245: INFO Total non-zero values:  libbeat.publisher.published_events=8080 tcp.dropped_because_of_gaps=15 redis.unmatched_responses=15
2017/01/11 09:42:38.430722 logp.go:246: INFO Uptime: 909.957024ms
2017/01/11 09:42:38.430728 beat.go:211: INFO packetbeat stopped.
➜  packetbeat git:(master) ✗

The error message is "ERR Failed to read integer reply: Expected digit".

After analyzing with Wireshark in contrast, I find the root case behind this: When REDIS response is big enough and network is not good enough, "packet loss" might happen, which result in the ERROR above.

Snapshot in my test:

According to the sequence of packets, I find:

  • No.37642 - HMGET with key hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user
*10
$5
HMGET
$44
hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user
$5
18370
$5
52708
$1
0
$5
18370
$4
1117
$2
13
$3
153
$3
147
  • No.37642 - [TCP Previous segment not captured], this response is relative to HMGET's last two fields.
:{"id":153,"email":"xiaojiao.xie@xxx","work_code":"E000027","mobile":186xxxx0925,"name":".........","walle_id":77287,"status":6,"pinyin_name":"xxj","sex":1,"security_level":60,"certificate_type":0,"certificate_number":"42068xxxxxxxxx3733","created_at":1431550029000,"updated_at":1449228237000,"nchr_id":"0001A910000000002EQP"}}
$519
{"userId":147,"userBuList":[3175],"tagsList":[],"userBuRoleDto":[{"id":81524,"bu_id":3175,"bu_name":"............BU","role_id":859,"role_name":".........","user_id":147,"user_name":"......"}],"user":{"id":147,"email":"xin.jin@xxxx","work_code":"E000029","mobile":186xxxx5626,"name":"......","walle_id":56063,"status":6,"pinyin_name":"jx","sex":1,"security_level":70,"certificate_type":0,"certificate_number":"420106xxxxxx510","created_at":1431550030000,"updated_at":1449228115000,"nchr_id":"0001A910000000002ERE"}}
  • No.37648 - [TCP Fast Retransmission], just retransmit the lost data segments.
*8
$532
{"userId":18370,"userBuList":[4594],"tagsList":[],"userBuRoleDto":[{"id":120993,"bu_id":4594,"bu_name":"..................","role_id":924,"role_name":"......","user_id":18370,"user_name":"......"}],"user":{"id":18370,"email":"hui.yaobj@xxx","work_code":"E019529","mobile":137xxxx8281,"name":"......","walle_id":23156752,"status":6,"pinyin_name":"yh","sex":1,"security_level":20,"certificate_type":0,"certificate_number":"13098xxxxxx033","created_at":1438657893000,"updated_at":1449228019000,"nchr_id":"0001A910000000013PM5"}}
...
$526
{"userId":153,"userBuList":[3174],"tagsList":[],"userBuRoleDto":[{"id":53306,"bu_id":3174,"bu_name":"............","role_id":922,"role_name":"......","user_id":153,"user_name":"........."}],"user"

In this case, redis_parse.go module dose not work correctly, and packetbeat will stop running as soon as the ERROR happens.

so, I think it is a bug, or how can i fix it ? thanks in advance.

BTW, once the Redis response is split from another position, the Error message might be different I think.

the protocol analyzers in packetbeat must synchronize to the network stream. If they're not in sync, the parser might fail. In this case the internal state is dropped and a new parser instance is generated, in the hope of us being in sync with the next packet. As packet-loss might occur at any time, we have to drop the parser state in most cases and try to resync.

packetbeat stops running, because all packets have been send to the protcol analyzers. Not all events might be published yet. try -waitstop 10s to wait for 10 more seconds wether any events are stuck.

Without the original pcap and without being able to debug it myself I can not comment on any events you observed.

I did some work on my original pcap file, and split it into several parts.

so, my conclusion now is: when network is bad and redis response is big enough, something wrong might happen depending on splitting position from raw data.

packetbeat stops running, because all packets have been send to the protcol analyzers.

It's right, I get it.

what means big enough? Currently packetbeat internally drops a stream if the active message exceeds 10MB I think.

"big enough" means that the raw data in redis response should be split into multiple segmented packets (i.e. [PSH, ACK] segment in 37642-37651_plrt_s0_pbErr.pcap). Sorry about my description.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.