Is there any way to do data sampling at client side?

Hi,

Packetbeat is sending too much data to server, Is there any way to reduce data transmission to server either by aggregation or some sampling techniques in packebeat itself?

Any suggestions or pointers will be helpful.

Thanks
Harsha

In packetbeat 5.0, filtering will be available: https://github.com/elastic/beats/issues/451 This is not doing and aggregation but could help you to filter out the not needed traffic.

Hi Ruflin,

Initially I want to implement sampling like send 1 transaction for every ten transactions. Where can I add my code, could you please give some pointers?

Thanks
Harsha

Is it protocol specific what you want to do or general? I assume you are aware of the packetbeat source code: https://github.com/elastic/beats/tree/master/packetbeat

In general, I want to do for all protocols.

I have gone through packetbeat source code, but Im little bit new to GO language and it is taking more time. I would be grateful if you can give some pointer to a class file or a method where I can add my code to do sampling.

Thanks in advance ruflin

@steffens You are probably the best person to point @harshafrnd4u to the right place?

we've been thinking to add some kind of sampling to filtering support in beats (currently in development) in the future. Not on roadmap for 5.0, but if community has good proposal/implementation (in form of a pull request) we will definitely consider adding it.

@monica Is currently working on generic filtering support. The filtering code can be found in [libbeat/filter] (https://github.com/elastic/beats/tree/master/libbeat/filter). For sample code check out the drop_event filter supporting events to be dropped if some configurable condition is meet. For sampling I can imagine something similar (e.g. apply random sampling with probability X if condition applies, otherwise forward event). You can find filtering related github issues via :Filtering tag.
It's up to you if you just want to try creating a PR as basis for discussion or if you think it's a good idea to discuss implementation + configuration detail for sampling in separate github issue first.