May I request you to let me know if there is a link to a detailed documentation on Topbeat addresssing below queries..
How it works? if process , system , file is all enabled? do the records for each sent to server in one push?
Are the records for process, files and system merged or sent separately with other columns empty?
What is the standard payload size or format?
Does topebeat pushes information irrespective of values changed/unchanged...
any indicator on # of records sent in each push and approximate size for the same.. ??
Similar many more technical questions..
Trying to answer the questions more on a high level:
Events sent together: This depends on the output selected. In general beats tries to send as many events as possible together to elasticsearch. This is more efficient.
The events are potentially sent together, but no merging or similar happens. Not sure what you mean by columns empty?
The payload is different for each type and heavily also depends on the content. For example if a process name is really long, it can be a large part of the payload. The best is to try it out.
Change / Unchanged: Yes, sending the data is based on the period independent of changes.
of records depends on your configuration and system. If you have 20 file systems it will send 20 events, if you have 200 processes, it will send 200 events for processes. Again here is best to try it out with your system.
I hope this helps. Perhaps you can give some details on what your end goal is?
I understand your request for such a mathematical model (see also Beats Overhead Topbeat and Packetbeat). One of the challenges in publishing such a model is that is has lots of small variables which must be taken into account which leads to that most of the calculations are wrong. We could publish an example with data on a machine which uses the default config but that would not apply to most of the production servers and would be misleading. Also not only does it depend on the beat but also on the setup of elasticsearch (number of replicas, shards).
When do you your POC, make sure to have it as close to the production system especially for packetbeat, as the type and content of the packages can make a big difference.
Thanks Ruflin, I appreciate your response and I understand the concern in general.
But I think we as in Elasticsearch community need to have a dedicated group that may help others on sizing . a special task force per se.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.