Beat/Module for build metrics

I'm trying to figure out what would be the approach to collect build metrics on build agent. Imagine we have VM or docker image which is created per build and destroyed after build is completed. During the build ( compile, unit tests, packaging) there is a build task which collects some source code metrics ( that sometimes are called Static Code Analysis Metrics). It would be nice to send them to Elastic/Kibana external server to be indexed and displayed.
It looks like we cannot use any existing beats:

  • Auditbeat uses specific Linux framework;
  • Filebeat is used for log files, maybe we can create special log type for build metrics;
  • Heartbeat is for service availability testing;
  • Metricbeat is mostly service based;
  • Packetbeat is mostly for networking;
  • Winlogbeat is for Windows log files

Is there any "Reportbeat" which can be used to generate report and send it to Elastic?

What exactly is generating these metrics and how are they outputted? If you can write them somehow to a file it seems Filebeat would be a good solution here.

Currently there are few csv files generated by each metrics report and we have few metrics reports implemented. Can we just send those files to elastic without implementing any services? From what I have red so far Filebeat is service based beat and I did not find "send single file" AP. It would be great if such API is available.
Just FYI, we are running each build in new freshly created virtual machine instance and after build is finished, VM instance is destroyed.
There is a java agent (Team City Agent) running as a service on each VM and that agent is running a build. We are sending agent logs to centralized Syslog-ng and there are few options available to visualize them. One is AgentLogs->Syslog-ng->Elastic->KIbana and another is AgentLogs->Filebeat->Elastic->Kibana. We tried first one and it worked, so second one should also work, we just have not implemented Filebeat for Team City Agent yet.
But... Metrics are not part of Agent(service)logs and they are not created for every build kind, so we need to send them out per case basis, "send a file" would be great.

Could you post a couple of sample lines from your metrics files, along with the corresponding documents you would like to index into Elasticsearch? I imagine there is some structure to these lines that can be used to parse out metrics into document fields. If I can get a sense of the complexity of your parsing rules, I can make some concrete recommendations on how to proceed.

One of metrics we use is Resource Standard Metrics or RSM https://msquaredtechnologies.com/Resource-Standard-Metrics.html
When we run RSM metrics report it produces very detailed analysis report per each C# file(!) including different numbers counted even from inside functions and at the end of the report it has aggregates: total/min/max/average. Here is an example of output for one .cs file(actually it's .csv file which is readable by excel), there are approximately 1000+ files in the scope of the build.
Here it is:
Type, Name, File Date, Size, LOC/FP, eLOC/FP, lLOC/FP, Comments, Blanks, Lines, Comment/eLOC, Classes, Functions, Parameters, Returns, InterComp, CycloComp, Total Complexity, Notices

File, src\ABCStructuredStorage\ComInterfaces.cs, 08/18/18 09:22:24, 9297, 140/2.6 , 124/2.3 , 53/1.0 , 18, 24, 177, 0.15, 2, 0, 0, 0, 0, 0, 0, 72

f(), ABCStructuredStorage.ABCStorage.ABCStorage, , , 8/0.2 , 4/0.1 , 3/0.1 , 0, 0, 8, 0.00, , 1, 2, 1, 3, 2, 5,
f(), ABCStructuredStorage.ABCStorage.ABCStorage, , , 2/0.0 , 0/0.0 , 0/0.0 , 0, 0, 2, 0.00, , 1, 1, 1, 2, 1, 3,
f(), ABCStructuredStorage.ABCStorage.Dispose, , , 8/0.2 , 4/0.1 , 3/0.1 , 0, 0, 8, 0.00, , 1, 0, 1, 1, 2, 3,
f(), ABCStructuredStorage.ABCStorage.CreateStorage, , , 9/0.2 , 5/0.1 , 4/0.1 , 1, 0, 10, 0.20, , 1, 1, 1, 2, 2, 4,
f(), ABCStructuredStorage.ABCStorage.CreateStorage, , , 10/0.2 , 6/0.1 , 5/0.1 , 1, 0, 11, 0.17, , 1, 2, 1, 3, 3, 6,
f(), ABCStructuredStorage.ABCStorage.CreateStream, , , 9/0.2 , 5/0.1 , 4/0.1 , 0, 0, 9, 0.00, , 1, 1, 1, 2, 2, 4,
f(), ABCStructuredStorage.ABCStorage.CreateStream, , , 6/0.1 , 4/0.1 , 4/0.1 , 0, 0, 6, 0.00, , 1, 3, 1, 4, 1, 5,
f(), ABCStructuredStorage.ABCStorage.OpenStorage, , , 13/0.2 , 7/0.1 , 5/0.1 , 0, 1, 14, 0.00, , 1, 2, 2, 4, 3, 7,
f(), ABCStructuredStorage.ABCStorage.OpenStorage, , , 3/0.1 , 1/0.0 , 1/0.0 , 0, 0, 3, 0.00, , 1, 1, 1, 2, 1, 3,
f(), ABCStructuredStorage.ABCStorage.OpenStream, , , 5/0.1 , 3/0.1 , 3/0.1 , 2, 1, 8, 0.67, , 1, 1, 1, 2, 1, 3,
f(), ABCStructuredStorage.ABCStorage.Exists, , , 14/0.3 , 8/0.2 , 6/0.1 , 0, 0, 14, 0.00, , 1, 2, 2, 4, 4, 8,
f(), ABCStructuredStorage.ABCStorage.StorageExists, , , 3/0.1 , 1/0.0 , 1/0.0 , 0, 0, 3, 0.00, , 1, 1, 1, 2, 1, 3,
f(), ABCStructuredStorage.ABCStorage.StreamExists, , , 3/0.1 , 1/0.0 , 1/0.0 , 0, 0, 3, 0.00, , 1, 1, 1, 2, 1, 3,
, Total, , , 93/1.8 , 49/0.9 , 40/0.8 , 4, 2, 99, 1.03, , 13, 18, 15, 33, 24, 57,
, Average, , , 7.15/0.2 , 3.77/0.1 , 3.08/0.1 , 0.31, 0.15, 7.62, 0.08, , 1.00, 1.38, 1.15, 2.54, 1.85, 4.38,
, Maximum, , , 14/0.3 , 8/0.2 , 6/0.1 , 2, 1, 14, 0.67, , 1, 3, 2, 4, 4, 8,
, Minimum, , , 2/0.0 , 0/0.0 , 0/0.0 , 0, 0, 2, 0.00, , 1, 0, 1, 1, 1, 3,

Thanks for posting the sample, that's very helpful!

This looks like the perfect case for using Filebeat to ingest this CS(V) file line-by-line and ship it off the box for further processing (parsing). For the "further processing" step, I'd suggest you start with shipping the data from Filebeat to Elasticsearch via an Elasticsearch Ingest node pipeline to parse each line of data into an Elasticsearch document.

See https://www.elastic.co/guide/en/beats/filebeat/master/configuring-ingest-node.html and https://www.elastic.co/blog/indexing-csv-elasticsearch-ingest-node.

At this point you'll have all your data in Elasticsearch indices with each Elasticsearch document corresponding to each line from your CS(V) file. Of course, each document will be a collection of fields with specific data types, all indexed in Elasticsearch, thereby allowing you to perform interesting queries and aggregations fast. Then it's simply a case of creating visualizations and dashboards in Kibana that make the most sense for your analysis.

Hope that helps!

Thanks a lot!
This sounds like I will need to create my own module(s) in Filebeat.
If there is a concept of custom Filebeat module? All existing modules I checked are service based, by I would need a module which can send data only once, not periodically, line-by-line from csv file but one time at the end of build. If there is such API for this?

If I understood correctly, a pipeline of parsers/processors will be created on elastic side from my metric json file which I should prepare for each metrics type. It should be pushed to elastic via 'curl -XPUT' command. This should this be done only once per document/metric type, right? If I need to modify pipeline, should I just re-post json or delete/post and new pipeline will be in effect after that?

Yes, there is a concept of a custom Filebeat module. If you want to do that, you can follow the instructions in the Filebeat Modules Developer Guide: Creating a New Filebeat Module | Beats Developer Guide [master] | Elastic.

AFAIK there is no such API but what you could do is point Filebeat to a folder where the CSV files will be dumped at the end of each build. That way when a build finishes and a new file shows up, Filebeat will automatically pick up and proceed.

The ingest node pipeline gets created in Elasticsearch, yes, but that part is handled by Filebeat for you. You just have to specify the JSON for the ingest node pipeline(s) when you create your custom Filebeat module. This is documented as part of the Filebeat Modules Developer Guide I linked to above.

Thanks a lot for the clarification!

As I can see now there are several concerns from my side for this design:

First of all, when you talk about mysql/postgres/apache or so they are usually persistent 'servers' and their uptime is 'as big as possible'. Build machines are totally opposites: their VMs or containers are created per build and destroyed when build is completed, so their uptime is 'as small as possible'. This is required to guarantee reproducibility of build environment and to make sure there is no environment modification by one build to affect another build.

Second is directories/folders. For persistence servers they are well defined like /var/log/messages or so. On build machines compilation happens on workspace where git/svn checkout happened, and path is auto-generated based on some root, for example: /opt/TeamCity/Agent/work/b6ccf2c86f2c0a08. The workspace directory structure is different from one build kind to another, so we cannot guess where csv files are, we should know that from inside the build. Another thing: not all build types produce metrics, there will be only part of compile builds, not installer builds nor test builds. This means we cannot install and configure Filebeat on build machine's template, we should do this during the build.

Third concern: timing/synchronization. When DB server runs some log messages may appeared very randomly, for example MySQL slow query messages only appeared when specific queries are executed, so constantly monitoring output stream is 'service job' and messages could be sent to elastic with relatively big delay. During the build metrics are generated in the specific build step usually close to beginning of build, we don't know exactly when build is completed (succeeded/failed), so it's hard for Filebeat to monitor output folder : it may miss csv files if build time is short. Synchronous approach would be better: ask Filebeat to push csv files and when it's done, return the prompt: this will be just another build step.

Pipeline json: if Filebeat creates ingest node pipeline by itself, what would happens when we run 300 builds a day many of those will be concurrent and each build is trying to create same pipeline on elastic side, or what happens if during one metrics data push, another 'create pipeline' statement is sent?

What if you guys have a discussion of new Filebeat feature: implement generic Filebeat module to support push log files on demand synchronously: pipeline is created once outside of Filebeat but log files push is blocking until files delivered. You even may include data processing on client side initiated from same json file without pushing that json to elastic.

Hi,

This use case looks pretty ad-hoc, I'm wondering if the TCP input would be useful here? https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-tcp.html

Basically you can configure filebeat to accept messages on a well-known port, then script your tests to forward all files there. As you said, pipelines could be defined outside this process.

Does that make sense?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.