Remove packetbeat references form libbeat


(ruflin) #1

As packetbeat was the first beat, there are still quite a few references in the libbeat library. The most obvious one is that the fallback elasticsearch index is called packetbeat. So for example for topbeat, the index packetbeat has to be used. I suggest the following:

  • Rename the fall back index to beat
  • Remove general references from libbeat to packetbeat

In addition I would suggest to add the index to the default config file of topbeat.


(Mark Walkom) #2

Thanks for pointing these out!

If you don't mind, it'd be great if you could raise an issue on GH so we can arrange to have these changes made :slight_smile: https://github.com/elastic/libbeat


(ruflin) #3

Hi Mark

I can also open a pull request with the changes. The main reason I brought it up here as I think it will have a broader affect for example also on the getting started guide of packetbeat here and I assume you guys are more aware of all the other places this could have an affect :wink:


(Mark Walkom) #4

Wow! Thanks heaps for that :smiley:


(Tudor Golubenco) #5

After going quickly through the list of "packetbeat" search hits in libbeat, I think there are two instances where the name is actually important:

  • the default value for the ES index name, that you mention as well
  • the default file name for the File output, here

Both of these are in the outputs plugins, initialized by the published in beats with code like this:

if err = publisher.Publisher.Init(Config.Output, Config.Shipper); err != nil {
	logp.Critical(err.Error())
	os.Exit(1)
}

I'm thinking to add the Beat name to the Init() method so that the default index can be "packetbeat" for Packetbeat, "topbeat" for Topbeat, etc. Because defaulting to "beat" or something common is likely not what the user wants.

We took this approach also for the configuration file name, for example. What do you think?


(ruflin) #6

I think it makes sense to use the beat name as the default name for index and file output. But it should still be possible to define it in the config to overwrite both. But this is not priority, same for the config file.


(ruflin) #7

I had a closer look at the code and I think it would make sense to directly move some of the code for topbeat and packagebeat to libbeat. For example the printing of the version or parts of reading the command line flags. This will it also more obvious what must be passed to libbeat, so it has all the information to execute the moved code. The goal here would be to simplify the setup of new beats, means less lines of code. The information about every beat could be put into a struct (name, version, ?). I'm not sure why it must be possible to overwrite version through command line flags as it is mentioned in the code.


(Tudor Golubenco) #8

Yeah, I kept looking for ways to simplify the main() routing of each Beat. I guess we could make it even smaller by providing some sort of "meta" package that will call the initialization and command line flags handling for all the others. Is that what you had in mind?

Changing the version is handy for building, we don't need to edit it in the source code every time we release something. But I think it can be changed even if it's in libbeat (not sure) or we can pass it as a parameter.


(ruflin) #9

The meta layer was exactly what I had in mind. My preferred option here is to create a third beat to follow my personal rule of three. As soon as we three different implementation it normally gets very obvious which parts are shared and can be abstracted and which have to be implemented individually. I was thinking of building the "iostat" or "df" beat as it is quite simple. Especially monitoring disk space is very common.

About the version number: If the one in the source code is not necessarly identical to what is released, why having it there? Doesn't lead this to the problem that a build is not necessarly reproducible? I would suggest we first focus on the topic above as perhaps it leads automatically to a solution with the version.


(ruflin) #10

@tudor As code often says more then a thousand words, here is the implementation of dfbeat: https://github.com/ruflin/dfbeat

I split up the implementation in two parts / packages. In the beat package is everything that I think should belong into libbeat. In the main package is everything which belongs to dfbeat. Some functions would have to be more extensible for packetbeat but I think it is a good start to show the direction I'm thinking.

One problem that I still have is with the Input config. I haven't figured out yet how I make it possible that InputConfig struct can be defined in the main implementation but is taken into account when the Config file is loaded.

Have a look at it and let me know what you think.


(Tudor Golubenco) #11

As code often says more then a thousand words, here is the implementation of dfbeat: https://github.com/ruflin/dfbeat

:heart:

The main function sure looks nice now.

Some functions would have to be more extensible for packetbeat but I think it is a good start to show the direction I'm thinking.

That would be my only concern, if we can keep the functionality we have in Packetbeat's main function. I think it should be doable.

One problem that I still have is with the Input config. I haven't figured out yet how I make it possible that InputConfig struct can be defined in the main implementation but is taken into account when the Config file is loaded.

Hmm, I see. Would it be bad to leave the Config structure in the Beat? It could be that some Beats want to have more top level sections (Packetbeat already does, I think).


(ruflin) #12

I think the best approach is that I try to restructure packetbeat the same way I did with bfbeat. If we end up with the same "beat" package for both repositories, it will be just copy paste to libbeat.

About the Config: When I look at the different config files, I see that about 90% of the content is duplicated. As I assume we will have very soon lots of beats, it will be cumbersome to keep all of them up-to-date in case something changes in shipper or the output part of the config. So I was thinking of having a small python(?) script inside libbeat that generates the config files. Each beat only needs to implement the parts which are different from the default config and the other parts will be auto generated. So the minimal beat would not even need a config file, but the dev and default config file to run the beat would be generated.

So in case we have "fundamental" changes to the config file, it would be nice to control it also as far as possible from libbeat, so not all beats have to be updated to change it. For the moment I think it is easier to keep it inside the beat itself. The only solution I could think of is that we could read the config twice, once for the input part and for the rest. But that is also not a good solution. I assume during the refactoring of packetbeat also some points here could become obvious.


(system) #13