Why packetbeat generates result of "libbeat.publisher.published_events" changing every time


#1

Hi,all:
I use packetbeat to analyse pcap file for redis protocol. The command i use is

./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-2_prot-7101_20161222110723_20161222110733.pcap -E packetbeat.protocols.redis.ports=7101 -t

and then, i will get something like this

2017/01/16 08:09:03.759058 beat.go:182: INFO Setup Beat: packetbeat; Version: 6.0.0-alpha1
2017/01/16 08:09:03.759514 file.go:54: INFO File output path set to: ./logs
2017/01/16 08:09:03.759942 file.go:55: INFO File output base filename set to: packetbeat
2017/01/16 08:09:03.760467 file.go:58: INFO Rotate every bytes set to: 10240000
2017/01/16 08:09:03.761045 file.go:62: INFO Number of files set to: 7
2017/01/16 08:09:03.761626 outputs.go:108: INFO Activated file as output plugin.
2017/01/16 08:09:03.762141 publish.go:295: INFO Publisher name: vagrant-ubuntu-trusty
2017/01/16 08:09:03.762712 async.go:63: INFO Flush Interval set to: -1s
2017/01/16 08:09:03.763242 async.go:64: INFO Max Bulk Size set to: -1
2017/01/16 08:09:03.763917 procs.go:79: INFO Process matching disabled
2017/01/16 08:09:03.764557 protos.go:89: INFO registered protocol plugin: memcache
2017/01/16 08:09:03.765030 protos.go:89: INFO registered protocol plugin: mysql
2017/01/16 08:09:03.765582 protos.go:89: INFO registered protocol plugin: nfs
2017/01/16 08:09:03.766002 protos.go:89: INFO registered protocol plugin: pgsql
2017/01/16 08:09:03.766519 protos.go:89: INFO registered protocol plugin: thrift
2017/01/16 08:09:03.767142 protos.go:89: INFO registered protocol plugin: amqp
2017/01/16 08:09:03.767680 protos.go:89: INFO registered protocol plugin: dns
2017/01/16 08:09:03.768176 protos.go:89: INFO registered protocol plugin: http
2017/01/16 08:09:03.768594 protos.go:89: INFO registered protocol plugin: mongodb
2017/01/16 08:09:03.769073 protos.go:89: INFO registered protocol plugin: redis
2017/01/16 08:09:03.769667 protos.go:89: INFO registered protocol plugin: cassandra
2017/01/16 08:09:03.770873 beat.go:212: INFO packetbeat start running.
2017/01/16 08:09:03.788220 logp.go:245: INFO Metrics logging every 30s
2017/01/16 08:09:05.615160 sniffer.go:384: INFO Input finish. Processed 72257 packets. Have a nice day!
2017/01/16 08:09:05.616699 util.go:48: INFO flows worker loop stopped
2017/01/16 08:09:05.617133 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=12880 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
2017/01/16 08:09:05.619232 logp.go:272: INFO Uptime: 1.978363811s
2017/01/16 08:09:05.619735 beat.go:216: INFO packetbeat stopped.
root@vagrant-ubuntu-trusty:/go/src/github.com/elastic/beats/packetbeat#

the problem is : every time i run the command above, i find the output value of "libbeat.publisher.published_events" is different.

here is three test sample:

root@vagrant-ubuntu-trusty:/go/src/github.com/elastic/beats/packetbeat# ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-2_prot-7101_20161222110723_20161222110733.pcap -E packetbeat.protocols.redis.ports=7101 -t
...
2017/01/16 08:25:21.078071 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23 libbeat.publisher.published_events=12071
...
root@vagrant-ubuntu-trusty:/go/src/github.com/elastic/beats/packetbeat# ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-2_prot-7101_20161222110723_20161222110733.pcap -E packetbeat.protocols.redis.ports=7101 -t
...
2017/01/16 08:25:24.575450 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=13601 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
...
root@vagrant-ubuntu-trusty:/go/src/github.com/elastic/beats/packetbeat# ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-2_prot-7101_20161222110723_20161222110733.pcap -E packetbeat.protocols.redis.ports=7101 -t
...
2017/01/16 08:25:29.346508 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=10775 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
...
root@vagrant-ubuntu-trusty:/go/src/github.com/elastic/beats/packetbeat#

as shown above, i get 12071, 13601 and 10775 respectively.

I can't figure out the reason behind this yet, but I'm still working on it.
so, can someone explain why this happens, and what I can do.
thanks a lot!

FYI, my packetbeat.yml as follow:

packetbeat.interfaces.device: any

packetbeat.flows:
  timeout: 30s
  period: 10s

#========================== Transaction protocols =============================

packetbeat.protocols.icmp:
  enabled: true

packetbeat.protocols.amqp:
  ports: [5672]

packetbeat.protocols.cassandra:
  ports: [9042]

packetbeat.protocols.dns:
  ports: [53]
  include_authorities: true
  include_additionals: true

packetbeat.protocols.http:
  ports: [80, 8080, 8000, 5000, 8002]

packetbeat.protocols.memcache:
  ports: [11211]

packetbeat.protocols.mysql:
  ports: [3306]

packetbeat.protocols.pgsql:
  ports: [5432]

packetbeat.protocols.redis:
  ports: [6379]

packetbeat.protocols.thrift:
  ports: [9090]

packetbeat.protocols.mongodb:
  ports: [27017]

packetbeat.protocols.nfs:
  ports: [2049]

#------------------------------- File output -----------------------------------
output.file:
  enabled: true
  path: "./logs"
  filename: packetbeat
  rotate_every_kb: 10000
  number_of_files: 7

Why everytime I run packetbeat and I open kibana to see the indexed pcap data the number of hits of the same pcap file keep changing?
(Steffen Siering) #2

what's the issue with this number changing? This metric documents the number of events being published since reported the last time (events per 30 seconds). packetbeat is reporting an event whenever it can correlate a finished transaction.


#3

my key point is: every time i run command ./packetbeat -c ./packetbeat.yml -e -I redis_xxxx.pcap -t, the result should be the same. In another word, I can't figure out why the result changing everytime based on the same pcap file and the same command arguments.

i did some research on the code yesterday, the output above generates here:

func LogTotalExpvars(cfg *Logging) {
	if cfg.Metrics.Enabled != nil && *cfg.Metrics.Enabled == false {
		return
	}
	vals := map[string]int64{}
	prevVals := map[string]int64{}
	snapshotExpvars(vals)
	metrics := buildMetricsOutput(prevVals, vals)
	Info("Total non-zero values: %s", metrics)
	Info("Uptime: %s", time.Now().Sub(startTime))
}

according to the code, the metrics in the final output is against the entire contnet of pcap file. The pcap file is "static", why not the metrics?

Plz help! Tks.


(Steffen Siering) #4

Have a look at buildMetricsOutput.

func buildMetricsOutput(prevVals map[string]int64, vals map[string]int64) string {
	metrics := ""
	for k, v := range vals {
		delta := v - prevVals[k]
		if delta != 0 {
			metrics = fmt.Sprintf("%s %s=%d", metrics, k, delta)
		}
	}
	return metrics
}

This function only adds the 'delta' since last report. with the report being triggered by a timer, the actual numbers are highly sensitive to timing. Have you tried to sum up all values.

When driving packetbeat from a pcap file, packetbeat might quit before all transactions have been published by packetbeat (cause by outputs/sink being slower then packetbeat + outputs batching up multiple events). Use -waitstop N (e.g. 10) to have packetbeat wait for N seconds to publish any missing events. With stats being reported every 30s, try with waitstop 60 and sum up the values (in filebeat we can stop once all events have been processed, feel free to open a feature request for packetbeat to quit and report number of events on shutdown). To guarantee some consistency+timing better use the console output and pipe stdout to /dev/null (logs are written to stderr).


#5

Thank you very much for the prompt reply. @steffens

According to your advice, I did some tests, and the results are very interesting:

my original test command

use -t to make sure packets reading as fast as possible, without sleeping.

./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t

and the outputs are:

... // first try
2017/01/18 03:19:09.313038 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=12435 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
2017/01/18 03:19:09.314856 logp.go:272: INFO Uptime: 1.917766418s
... // second try
2017/01/18 03:19:33.508026 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=11618 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
2017/01/18 03:19:33.509738 logp.go:272: INFO Uptime: 1.784077447s
...

as your saying: "packetbeat might quit before all transactions have been published". I think it is right.

test with -waitstop option

  • -waitstop=10
./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t -waitstop=10

the results are

... // first try
2017/01/18 03:44:31.744941 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=11927 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 03:44:31.744983 logp.go:272: INFO Uptime: 11.913750805s
... // second try
2017/01/18 03:44:45.721812 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23 libbeat.publisher.published_events=11781
2017/01/18 03:44:45.721859 logp.go:272: INFO Uptime: 11.879544437s
... // third try
2017/01/18 03:45:00.514356 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=11938 redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4
2017/01/18 03:45:00.514383 logp.go:272: INFO Uptime: 11.897740749s

as you can see, the value of "published_events" is changing every time.

  • -waitstop=30
./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t -waitstop=30

the results are

... // first try
2017/01/18 03:50:25.763133 logp.go:256: INFO Non-zero metrics in the last 30s: redis.unmatched_responses=23 libbeat.publisher.published_events=12279 tcp.dropped_because_of_gaps=4
2017/01/18 03:50:27.445371 logp.go:271: INFO Total non-zero values:  redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=12279
2017/01/18 03:50:27.445456 logp.go:272: INFO Uptime: 31.833347355s
... // second try
2017/01/18 03:51:28.245237 logp.go:256: INFO Non-zero metrics in the last 30s: tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=12554 redis.unmatched_responses=23
2017/01/18 03:51:29.984602 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23 libbeat.publisher.published_events=12554
2017/01/18 03:51:29.984651 logp.go:272: INFO Uptime: 31.90361899s
... // third try
2017/01/18 03:52:20.128414 logp.go:256: INFO Non-zero metrics in the last 30s: tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=13294 redis.unmatched_responses=23
2017/01/18 03:52:21.924096 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=13294 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 03:52:21.925612 logp.go:272: INFO Uptime: 31.940463736s

as you can see, the value of "published_events" is still changing every time.

  • -waitstop=60
./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t -waitstop=60

the results are

... // first try
2017/01/18 03:56:22.190707 logp.go:256: INFO Non-zero metrics in the last 30s: redis.unmatched_responses=23 libbeat.publisher.published_events=13762 tcp.dropped_because_of_gaps=4
2017/01/18 03:56:52.190229 logp.go:258: INFO No non-zero metrics in the last 30s
2017/01/18 03:56:53.971348 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=13762 redis.unmatched_responses=23
2017/01/18 03:56:53.971375 logp.go:272: INFO Uptime: 1m1.922213479s
... // second try
2017/01/18 03:57:27.882397 logp.go:256: INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=12689 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 03:57:57.881640 logp.go:258: INFO No non-zero metrics in the last 30s
2017/01/18 03:57:59.669571 logp.go:271: INFO Total non-zero values:  redis.unmatched_responses=23 tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=12689
2017/01/18 03:57:59.669609 logp.go:272: INFO Uptime: 1m1.935003565s
... // third try
2017/01/18 03:58:35.156785 logp.go:256: INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=12411 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 03:59:05.155914 logp.go:258: INFO No non-zero metrics in the last 30s
2017/01/18 03:59:06.976913 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=12411 redis.unmatched_responses=23
2017/01/18 03:59:06.976964 logp.go:272: INFO Uptime: 1m1.961463036s

as you can see, the value of "published_events" is still changing no matter what value I choose.

so the conclusions is

  • with -t option, there is no need sum up all values, just use the "Total" result is ok.
  • with -waitstop N option only, the value of "publish_events" is still changing everytime.

#6

test with different queue_size value

the value of queue_size in packetbeat.yml is 1000 by default.

#================================ General =====================================
...
# Internal queue size for single events in processing pipeline
queue_size: 1000
  • case 1: queue_size=1000 and -waitstop=10
./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t -waitstop=10

the results are

... // first try
2017/01/18 06:13:21.498505 logp.go:271: INFO Total non-zero values:  tcp.dropped_because_of_gaps=4 libbeat.publisher.published_events=14184 redis.unmatched_responses=23
2017/01/18 06:13:21.498533 logp.go:272: INFO Uptime: 12.243276728s
... // second try
2017/01/18 06:14:48.174210 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=12337 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 06:14:48.174302 logp.go:272: INFO Uptime: 11.881996238s
... // third try
2017/01/18 06:15:13.140022 logp.go:271: INFO Total non-zero values:  redis.unmatched_responses=23 libbeat.publisher.published_events=11211 tcp.dropped_because_of_gaps=4
2017/01/18 06:15:13.140074 logp.go:272: INFO Uptime: 11.813326164s

as you can see, the value of "published_events" is changing every time.

  • case 2: queue_size=10000 and -waitstop=10
./packetbeat -c ./packetbeat.yml -e -I redis_xxx.pcap -E packetbeat.protocols.redis.ports=7101 -t -waitstop=10

the results are

... // first try
2017/01/18 06:17:06.842958 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=20678 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 06:17:06.843001 logp.go:272: INFO Uptime: 12.031343837s
... // second try
2017/01/18 06:17:25.012243 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=20678 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23
2017/01/18 06:17:25.012284 logp.go:272: INFO Uptime: 12.116870701s
... // third try
2017/01/18 06:17:40.081266 logp.go:271: INFO Total non-zero values:  redis.unmatched_responses=23 libbeat.publisher.published_events=20678 tcp.dropped_because_of_gaps=4
2017/01/18 06:17:40.081294 logp.go:272: INFO Uptime: 12.001298232s

oh my god, the value of "published_events" is 20678 every time, the results seem "stable"!

after dozens of tests, the result seems to be "almost" same. because i actually saw this

... // Nth try
2017/01/18 06:22:48.935181 logp.go:271: INFO Total non-zero values:  libbeat.publisher.published_events=20620 tcp.dropped_because_of_gaps=4 redis.unmatched_responses=23

but, anyway, i think I've got the answer.


(Steffen Siering) #7

What's your use-case? Which output are you using?

In packetbeat the traffic analyzer and the outputs operate independently connected by a queue to buffering up events. As the traffic analyzer needs to keep internal states and packet drops due to slow-downs have bad effects on the state, the queue between the traffic analyzer and the outputs does not block. That is, in order to protect internal state, the queue drops events if the outputs can not keep up with the traffic analyzer. This is why an increased queue size helps in buffering events.

The pcap reading mode is mostly used for debugging/testing purposes, that's why there is no special 'blocking' support in the queue. But granted, support for reading from file can be enhanced, e.g. by making the queue blocking (plus quit only after all events have been published). If that's the use-case you're after, feel free to open an enhancement request for packetbeat.


#8

What's your use-case?

my use-case is: someone provides me several pcap files about redis cluster, and I must analyze all these files to show:

  • the number of completion requests
  • the latency statistics (top N)
  • the failure rate
  • something else

Which output are you using?

when debugging, I use console as stderr output, and use file as transactions output. Anyway, the value of "publish_events" is equal to the value of cat logs/packetbeat* | wc -l.

That is, in order to protect internal state, the queue drops events if the outputs can not keep up with the traffic analyzer.

yes, I've found the codes as you described:

func (p *PacketbeatPublisher) PublishTransaction(event common.MapStr) bool {
	select {
	case p.trans <- event:
		return true
	default:
		// drop event if queue is full
		return false
	}
}

Reading from pcap files for me is a realistic and valuable demand, not just for debugging/testing purposes. I must make sure the output results "stable" when providing this tool to someone else (including my manager).

Finally, thanks again, and I will open an enhancement request for this soon.


#9

I have opened an issue for this feature. :slight_smile:


(Steffen Siering) #10

Thanks. You will also need this one to compute proper latency stats.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.