Configuring a new environment with Logstash

Hi there,
Basically I have an entire product using log4net and I want to start collecting all these logs in a centralized Elasticsearch. I have done a lot of research for the past two weeks and would like your recommendations too. I have come up with a few approaches that seem feasible:

a. One approach is to install logstash all servers and modify each service to add another log4net appender to send logs to logstash too, then logstash on each individual machine will send this data to elasticsearch.

b. The other approach is to have a centralized logstash and make the log4net appender of each service send logs to a centralized server which then connects to elasticsearch. Approach 'a' seems better than b since the overhead of 'b' will be more on the server leading to a decrease in efficiency.

c. Third approach is to have filebeat running on each server which gathers the logs and sends it to a centralized logstash which then connects to elasticsearch. The advantage of this approach is that we will not need to modify any service/application.

Lastly, I would also like to know if you'll advice running elastic cloud or running elastic on one of our vm's.

I have created a new topic for this mainly because our main requirements here are listed by priority:

  1. LEAST amount of overhead on each server.
  2. Making use of log4net so that we dont have to create a filter since every service generates logs differently. (basically something like type => "log4net" in logstash)
  3. Lesser cost
  4. Providing real-time analysis on elasticsearch

Sorry for the length of the question. Any advice would be really helpful. THANK YOU.

Option (a) puts a relatively-heavy Java process on each of your edge nodes, which is less than ideal. Logstash can be a super-powerful pipeline enrichment tool, but it's pretty heavy to be used as merely a log shipper. In fact, that's a big part of why option (c) exists; Filebeat (and Metricbeat and all the other beats) is super lightweight and is purpose-built for this.

Option (c) also has another advantage over the option (b): the Beats protocol being used works really well with back-pressure, so if the centralised Logstash host gets temporarily swamped, the beats will slow down their transmission of data until the Logstash node can catch up, and batches are acknowledged as part of the protocol so we can ensure everything that is supposed to get from A to B actually does.

As for Elastic Cloud vs running your own VM, it really depends on your use-case. Elastic Cloud does simplify a lot of the complexities of managing one or more clusters and adds a lot of purpose-built tooling for seeing what's going on in your clusters and quick spin-ups of new clusters etc., but if you are already staffing Elasticsearch experts, there may be reasons why you would want to run your own. There's also Elastic Cloud Enterprise, which is a self-hosted version of Elastic Cloud that runs on most of the major cloud providers to give you a good balance of on-prem benefits (such as DC-locality that is required by law for certain types of data) along-side the benefits of Elastic Cloud.

Thank you so much for the detailed answer.

Just one last doubt was that, how do i get around creating a new filter in logstash for each service? Because all the services generates logs in a different manner and its almost impossible to create a general filter for even a couple of services.

The only thing common between them is that they all log using 'log4net', so i was hoping i could use that to my advantage. So i tried to use this nuget package called log4net.Elasticsearch which does exactly what I want, the only issue is that its not up to date with the latest log4net so i cant use it on all my services.

Do you have any idea, what will be the best way here to implement my centralized logging infrastructure?

Thanks.

There is no "magic bullet" when it comes to parsing logs, but if your logs have a fair bit in common (e.g., a common prefix for information such as timestamp/host/service name, as should be provided by having a common logging framework in your applications), it's pretty straight forward to have a single pipeline that first extracts the common bits all at once, with specialised filters that run on specific subsets.

If you'd be willing to share a subset of a few dozen log messages (preferably showing both their diversity and commonality), I'd be glad to help you bootstrap a pipeline config.

If you are using the structured logging features of log4net, you can configure your applications to output that structure using a format like JSON, which can easily be read and expanded with the JSON Filter plugin and/or the JSON codec, which means you would not have to build distinct grok patterns, but could rely on the existing structure.

If you are using the structured logging features of log4net, you can configure your applications to output that structure using a format like JSON, which can easily be read and expanded with the JSON Filter plugin and/or the JSON codec, which means you would not have to build distinct grok patterns, but could rely on the existing structure.

Yes, this. Whenever possible I really really recommend reconfiguring applications to emit JSON logs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.