Custom fields are not stored in the message root


#1

Hi there,

As per http://pastebin.com/EQ1dQVna

Adding custom fields under prospectors: fields: does not give the expected result. Any custom fields are added under a field named "fields", rather than to the root of the message.

E.g., trying to add custom fields "type": "foo" and a "system": "bar" results in

"fields" => { "type" => "foo", "system" => "bar" },

This breaks existing Logstash filter conditionals as custom fields are no longer in the root of the message, as well as causing issues for existing Elasticsearch mappings and Kibana vis and dashboards which rely on these fields being in the correct place.

LSF and log-courier both allow adding arbitrary custom fields to the root of the messages. This should also be possible with Filebeat to maintain compatibility and assist migration.


(ruflin) #2

H @ceekay

The main reason fields was moved to create its own nested document is to prevent naming conflicts. As filebeat is built on top of libbeat which also sends data to ES / LS more fields are already predefined. If your goal is to define the document type with "type" above, with the latest build you can use document_type inside each prospector.

Coming from LSF it seems like the naming of fields is not obvious. Would it be better to name it "tags" or something similar. "tags" is already used by the base libbeat but these could be merged.

Not having the fields in root, does it also have limitations or does it mainly break the existing dashboards?


(Owulff) #3

Hi @ruflin

It's not only the dashboards which are broken. The key problem is that you can't migrate from LSF to Filebeat all at once which means some applications use LSF some others FileBeat. This means that for example the access log information sent from LSF and FileBeat looks completely different. That means different filters in logstash, different watcher queries and finally a dashboard for LSF and a dashboard for FileBeat.

IMHO, I don't see the added value in having the fields element at all in the JSON document.

Thanks
Oli


#4

Hi @ruflin

Thanks for your reply.

We rely heavily on custom fields, both for logstash conditionals and dashboards. Some examples we use are subtype, system, and sla. More importantly, we segregate logs from different departments into separate ES indexes with a unique department field, which is critical to our entire ELK stack working correctly.

These fields do not translate well to tags, as we need to be able to refer to them explicitly in the logstash pipeline as well as in dashboards.

Not having custom field compatibility with LSF and log-courier would be a complete show-stopper for us. With the number of systems we have sending to Logstash, re-working everything to account for nested fields would make migration nearly impossible, and would certainly cause breakages along the way.

The ability to add custom fields, not tags, to both topbeat and packetbeat would also be immensely valuable to us for the same reasons we need them in filebeat.

I understand your concern with naming conflicts, but in a case where custom fields are defined, would this not become the user’s issue to solve if they cause a field conflict?


(Owulff) #5

RC1 resolved the issue by setting fields_under_root to true.

Thanks
Oli


(ruflin) #6

@ceekay You mentioned that this would also be useful for topbeat and packetbeat. For packetbeat you mean for example per protocol? Can you give some examples here?


(Steffen Siering) #7

In future (after GA), we plan on adding some filtering support to libbeat (all beats will profit). Adding, removing, renaming fields sounds like simple use-case for filtering support. Until we've got filtering support fields and fields_under_root are your to go solution for filebeat I would say.


#8

Hi @ruflin

This mainly goes back to being able to split indexes per department, for data segregation and billing reasons - this is what we're already doing with log-courier on prod and filebeat on dev.

I don't have a firm plan for other custom fields for packetbeat and topbeat yet as obviously it's not supported at this stage, but I imagine an SLA field would at least be in there somewhere, especially if the data originates from client systems.

We would input something like "department" => "sysadmins", and output from Logstash to index => "topbeat-%{department}-%{+YYYY.MM.dd}".

This allows us to easily check index disk usage by department, as well as restricting access by index name. For oversight we would simply query topbeat-* with Kibana.

@steffens You replied while I was writing this :slight_smile: - it sounds like a great plan, as it adds a lot of flexibility to the pipeline.

Thanks!


(Steffen Siering) #9

@ceekay for now you can try to adapt your events in logstash, though. All beats publish a field [@metadata][beat]. By using a conditional filter to check if [@metadata][beat] is present (that is your event is coming from beats) you can use the mutate filter to masssage your events to your needs. e.g. for topbeat [@metadata][beat] will be 'topbeat'


#10

@steffens that's not the problem so much - for us it's more knowing which type of system the data has originated from. The "department" field is the most important in my previous example.

Aside from our internal systems, we provide development and hosting services for a large number of external clients. If we want to add, say, owner => "clientx" to a message, it's not very easy without custom fields.

For testing, I'm using an array of tags and converting the values to fields as soon as they arrive at Logstash, but that's extremely fragile as they need to be in exactly the right order for this to work correctly. It also seems like unnecessary processing for every message.

It also means that if we want to add an arbitrary field later on - for example to define a new aggregation in Kibana for a particular client or system - we need to modify all our filters to convert this tag to a field, while retaining compatibility with messages that don't have this tag. I can see that getting very messy, very quickly.


#11

@steffens @ruflin I see fields for all beats will most likely be making it into v5. This will make my life a lot easier - thanks very much!


(ruflin) #12

@ceekay Yes, it will be part of the 5.0 release.


(system) #13