Help jump from logstash to beats

Hi again here!

I'm going to try replacing logstash to beats as a ship mechanism (as part of a migration from 1.5 to 5.x) , but I need some help to understand what are the pieces roles.
Currently, I'm reading and parsing (custom grok filters) log files (mainly streaming apps, so no typical loglines) in two steps: 1st, on the origin server, a local logstash instance inputs from file, adds tags depending on app and file folder and sends to redis. A 2nd central logstash gets from redis, reads tags, and outputs to Elasticsearch switching index based on tags (so, a different index for every app).

By reading and re-reading documentation, I realize how ELK has evolved, and now I'm planning to use filebeat to send to the central logstash... but here doubts arise to understand the "whole picture":

  • From my readings, I think, my classic, input file -> tag based log tracking/filter logic, would have to be replaced by document_type and type filter/logic... could someone confirm?
  • Less clear is the filebeat modules role here... The more I read, the more confused, do I need to create a module for every app? I plan to keep parsing in logstash (with grok)... I don't understand them in the equation, Can I get rid of them? could someone clarify?
  • Absolutelly lost when reading that, onElasticsearch side, I have to "load the index templates manually" ... what heppens here? I have never heard about "elasticsearch index templates"... I just want to output logstash -> elasticsearch, what changes? which templates should I load? filebeat related ones? my apps related?

Thank you very much in advance.
Best regards.

From my readings, I think, my classic, input file -> tag based log tracking/filter logic, would have to be replaced by document_type and type filter/logic... could someone confirm?

You can still apply tags and custom fields to events in beats. e.g. The document_type was supposed to set the _type and type field in ES. But with ES removing support for _type in the future, the default value of _type is docs since filebeat 5.6.

Less clear is the filebeat modules role here... The more I read, the more confused, do I need to create a module for every app? I plan to keep parsing in logstash (with grok)... I don't understand them in the equation, Can I get rid of them? could someone clarify?

Modules are more or less an full package including ES Ingest Node settings and kibana dashboards. Filebeat modules are an extensions to filebeats functionality (like logstash modules). You don't have to use modules if you don't need them. As you have everything in place, just configure the prospectors.

Having said this, if you have any well known services in place and you're willing to invest time to build modules, any contributions are very welcome. Unfortunately Filebeat modules do not support Logstash right now.

Absolutelly lost when reading that, onElasticsearch side, I have to "load the index templates manually" ... what heppens here? I have never heard about "elasticsearch index templates"... I just want to output logstash -> elasticsearch, what changes? which templates should I load? filebeat related ones? my apps related?

Elasticsearch creates a mapping (kind of a schema) per index (even in 1.x). The schema for a new index can be pre-defined using templates. If no template is given, fields type are determined dynamically be Elasticsearch. Important, all fields of a name must have the same type. When using Filebeat->Elasticsearch, filebeat installs a template with types for some common fields. Even Logstash can install a template for you.

Hi! ... thanks for your help.

So, thanks to your point, I realize that I have to carefully read about tag usage in that, so called, 'prospectors'... because adding propper tags upon data ingest from log file, my logstash filter may work as they are.
Also, confirmed my thoughs that beat module stuff do not play any role in my scenario.

Regarding EL (I guess that goes away from this forum section), it has been historically so 'automagic', that I have never cared about its internals, beyond server clustering for ha and performance... What I did in the past is to just let a logstash instance, to output to EL, just switching the target index upon tag presence (if/else on logstash output). date is added to the index name, so, for N services, I got N new indices every day... and a curator cron deleting oldest ones... probably spartan, but has worked fine for me, I could create incredible dashboards in kibana 3 without knowing anything about 'schemes'...
Now I'm afraid switching to EL 5.x requires some further understanding/configuration of those 'schemes'.

I have shared my logstash filters and related grok patterns for every service I have configured: https://github.com/alexolivan/logstash_filters
By reading you, I understand that by porting the filter/logic-analysis functionality of my logstash filters, to those "beat modules", I could have beats sending directly to EL without an intermediate logstash, but, in that case, EL should be accordingly configured with some of those 'templates' ... is this correct?

'Unfortunately Filebeat modules do not support Logstash right now.' ... this scares me a little bit... can't filebeat send to logstash? I assume it can... or this is related only to just beats filtering/analisys capability to work together with logstash?

One final cuestion arise by reading your reply... what about geoip .... is filebeats capable (thorugh a 'module') to perform geoip data-enrichment? or still logstash is needed for that?

Thank you very very much for your advice!
Best regards!

Now I'm afraid switching to EL 5.x requires some further understanding/configuration of those 'schemes'.

You can still treat it as 'automagic'. But if things go wrong you will have to understand how mappings work and might want to look further into templates. This is true as well for 1.x as it is for 5.x and upcoming 6.x. If it did work ok for you so far, I don't see why it should break with 5.x. Only problem I can think of might be the removal of _type in 6.x. In case you make use of _type, consider introducing a type field.

By reading you, I understand that by porting the filter/logic-analysis functionality of my logstash filters, to those "beat modules", I could have beats sending directly to EL without an intermediate logstash, but, in that case, EL should be accordingly configured with some of those 'templates' ... is this correct?

Yeah, with turning these into modules you can have filebeat directly ship to ES and ES would to the processing/parsing. When implementing modules, we also define the fields a module generates in a file called fields.yml. This fields.yml is also used to generate docs + generate index-pattern in kibana (e.g. this field is in bytes or a percentage), as kibana applies some additional mapping on an index via index-patterns. Once fields.yml is defined filebeat will automatically pick up the definitions and install templates and such.

'Unfortunately Filebeat modules do not support Logstash right now.' ... this scares me a little bit... can't filebeat send to logstash? I assume it can... or this is related only to just beats filtering/analisys capability to work together with logstash?

In the future we want modules to work across all products. But apparently it's not that easy, also due to logstash and ingest node configurations are somewhat different. With 6.x one can use filebeat modules, but needs to use filebeat setup in order to install the template and kibana index pattern + one will use logstash merely as a proxy forwarding events to the ingest pipeline as defined by filebeat.

One final cuestion arise by reading your reply... what about geoip .... is filebeats capable (thorugh a 'module') to perform geoip data-enrichment? or still logstash is needed for that?

filebeat does no processing of events. It's basically a shipper, focusing only on the task of forwarding logs to downstream systems. Geoip can be either done in logstash or Elasticsearch Ingest node via the geoip processor (requires geoip plugin to be installed, as it's very big and not shipped with ES by default).

Given you have all the processing already in logstash, developing filebeat modules in addition will be quite some overkill for you. As a first step I'd recommend to just have filebeat point to redis such that you don't have to change downstream event processing. Given logstash introduced persistent queuing (receive queue writes events to disk before additional processing), you might also consider to remove redis from your architecture (do some testing before). If you are still interested in ES Ingest Node and Filebeat modules, we can give you some more tips and pointers (feel free to open another discussion), but for now I'd say keep it simple and do/try only one change after another.

...aha.

Reading your suggestion of shipping right to reddis from filebeats sounds good... My idea was to not use even redis, since it is overkill and I used it when playing with ELK just because it appeared on the howtos I bookmarked.

I will try both with and without redis, since by reading beats documentation I got the impression that beats has natively integrated load balancing and failover capabilities when shipping to logstash (or at least, that is what I understood) a fact that I found fascinating!, so much that it pays of trying to struggle with it and give a try.

Best regards!

The more services you can remove in your system, the better :wink:

Right, beats supports load-balancing to multiple Logstash endpoints. The LB is somewhat 'dynamic' using a shared work-queue in beats. If one LS instance is blocked or slower then the others, it will retrieve less work from the work-queue. If one LS errors, beats will return the events into the retry queue for processing with another LS endpoint. The failed endpoint will not receive any new work until beats is able to reconnect with this instance. If you have multiple redis servers, filebeat can also load-balance to these servers.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.