Fluentd vs Logstash?

Hello! I know this is a board for Logstash, but I was hoping someone might have some experience with Fluentd and be able to talk about why you chose one over the other.

We're evaluating logging solutions at our company and I want to get a sense of what I should be using. All of our logs will end up in elasticsearch. We just want the best way to get them there.

The general idea that I have from some googling around is that Fluentd is a bit simpler to use, but you lose some of the flexibility that Logstash provides.

If anyone could comment on the performance of either (benchmarks or anecdotes) that would be awesome, too. I haven't found anything conclusive yet.

Commenting here as a previous user of Fluentd - just to be clear, this isn't any official stance on the question, just my personal take having used both Fluentd and Logstash for several years.

In my experience, the biggest differences you'll find stem from the difference between running in the JVM or not.

First, the JVM lets you get good parallelism (see: the workers parameter in several output plugins), Fluentd has the multiprocess plugin, which I've used quite a bit, but it's a bit of a workaround (you have to pipe between processes if you want to parallelize the pipeline, which is pretty manual and makes debugging difficult.)

The JVM obviously incurs overhead, and you need to be aware of heap space, etc. This can be a blessing or a curse - for example, you may get annoyed by having to set the heap space, but may wish you could when fluentd's memory footprint balloons because of queued messages.

Last time I used fluentd, I was mostly writing regex parsers for logs, so I didn't have the benefit of grok, which I think there's a fluentd plugin for now. I ended up with truly horrifying regexes to parse messages without grok, and coupled with how grok has been integrated from the beginning in logstash and has been tuned for performance a lot recently, having a fast, mature grok parser available is awesome.

Fluentd's file buffer plugin is very useful to add some resiliency to your log pipeline. In Logstash you may need to add something like redis to address backpressure.

I'm completely spitballing here, but I would guess performance would be better in the JVM, but I would strongly encourage you to just try some benchmarks, because use cases can vary quite a bit. For example, you may be grok/parse-heavy which maybe faster in one or another, or may find your input method may vary from one to another (whether you're getting messages from files, syslog, etc.)