Topbeat - Detailed flowchart or documentation on how it works

Dear Team,

May I request you to let me know if there is a link to a detailed documentation on Topbeat addresssing below queries..

How it works? if process , system , file is all enabled? do the records for each sent to server in one push?
Are the records for process, files and system merged or sent separately with other columns empty?
What is the standard payload size or format?
Does topebeat pushes information irrespective of values changed/unchanged...
any indicator on # of records sent in each push and approximate size for the same.. ??
Similar many more technical questions..

can some help address or point to some manual..

Regards, Chirag Shah

Trying to answer the questions more on a high level:

  • Events sent together: This depends on the output selected. In general beats tries to send as many events as possible together to elasticsearch. This is more efficient.
  • The events are potentially sent together, but no merging or similar happens. Not sure what you mean by columns empty?
  • The payload is different for each type and heavily also depends on the content. For example if a process name is really long, it can be a large part of the payload. The best is to try it out.
  • Change / Unchanged: Yes, sending the data is based on the period independent of changes.
  • of records depends on your configuration and system. If you have 20 file systems it will send 20 events, if you have 200 processes, it will send 200 events for processes. Again here is best to try it out with your system.

I hope this helps. Perhaps you can give some details on what your end goal is?

Dear Ruflin,

Thanks for the response.
At higher level I do understand and have poc underway.

But when it comes to sizing server for n no of topbeats for 10s/5m with highlevel configurations... say only processes

I believe there should be some sizing sheet to do the needful and avoid poc to validate all...Mathematical model to simulate sizing??

Regards, Chirag Shah

I understand your request for such a mathematical model (see also Beats Overhead Topbeat and Packetbeat). One of the challenges in publishing such a model is that is has lots of small variables which must be taken into account which leads to that most of the calculations are wrong. We could publish an example with data on a machine which uses the default config but that would not apply to most of the production servers and would be misleading. Also not only does it depend on the beat but also on the setup of elasticsearch (number of replicas, shards).

When do you your POC, make sure to have it as close to the production system especially for packetbeat, as the type and content of the packages can make a big difference.

Thanks Ruflin, I appreciate your response and I understand the concern in general.
But I think we as in Elasticsearch community need to have a dedicated group that may help others on sizing . a special task force per se.

Any ways thanks!

Regards, Chirag Shah

This topic was automatically closed after 21 days. New replies are no longer allowed.