Corrupted HTTP transactions data under heavy load

(Vladimir Aleksandrov) #1

Under heavy load (~100K tpm) I see various data corruption for HTTP transaction recorded:

  • broken request methods names like GEGET, GETGET
  • invalid response codes instead of 200 - 0, 1, 2, 65 etc
  • missed forward slash in request path: GET geo.v1/pub/place/details instead of GET /geo.v1/pub/place/details

I did cross check with web service access log and all those requests are properly formatted and have response code 200.

Downgrade to packetbeat 1.1.2 from 1.2.2 helped a little - the errors rate is lower, but I still see corrupted data.

Corrupted method example:

{ "@timestamp":"2016-05-12T22:35:31.158Z", "beat":{ "hostname":"--masked--", "name":"pgeo1" }, "bytes_in":239, "bytes_out":592, "client_ip":"", "client_port":16309, "client_proc":"", "client_server":"", "count":1, "direction":"in", "http":{ "code":200, "content_length":250, "phrase":"OK" }, "ip":"", "method":"GEGET", "params":"city=Fordland\u0026country=US", "path":"/pgeo.v1/pub/place/details", "port":9080, "proc":"", "query":"GEGET /pgeo.v1/pub/place/details", "real_ip":"", "responsetime":16, "server":"", "status":"OK", "tags":[ "pgeo.v1" ], "type":"http" }

Corrupted response code 635 and partial response body instead of "OK" phrase:
{ "@timestamp": "2016-05-12T22:33:51.911Z", "beat": { "hostname": "--masked--", "name": "pgeo1" }, "bytes_in": 240, "bytes_out": 1188, "client_ip": "", "client_port": 10449, "client_proc": "", "client_server": "", "count": 1, "direction": "in", "http": { "code": 635, "content_length": 255, "phrase": "Dorado\",\"adminCode2\":\"017\",\"adminName3\":\"\",\"adminCode3\":\"\",\"latitude\":38.9143,\"longitude\":-120.9001,\"accuracy\":0,\"success\":true},{\"countryCode\":\"US\",\"postalCode\":\"95643\",\"placeName\":\"Kelsey\",\"adminName1\":\"California\",\"adminCode1\":\"Cf9" }, "ip": "", "method": "GET", "params": "country=US&postalcode=16127", "path": "/pgeo.v1/pub/place/details", "port": 9080, "proc": "", "query": "GET /pgeo.v1/pub/place/details", "real_ip": "", "responsetime": 19, "server": "", "status": "Error", "tags": [ "pgeo.v1", "beatevent", "beats_input_raw_event" ], "type": "http", "@version": "1", "host": "--masked--" }

(Steffen Siering) #2

It's the first time I see this happen. Have you got a pcap to reproduce the issue? packetbeat can record a pcap via -dump <dump.pcap>. You can try if pcap can reproduce issue e.g. via

./packetbeat -e -N -I <dump.pcap> -d 'publish'. The -N option disables all output (so no data will be indexed when testing your pcap) and -d 'publish' will put publisher in debug mode printing events to be published to console.

(system) #3