Processing IIS logs in filebeat - regxp? documentation?

Hi all,

I've been trying to find a functional method to process IIS logs and send them to elasticsearch without using Logstash.

I have seen mention of using regxp to pre-process inputs, and I'm trying to develop patterns for this.

The problem is, my well-experienced google-fu skills have provided little help in terms of how to do this or how to implement it into filebeat. Is this possible?

I found a few different pull requests here and here related to the issue of regxp documentation in Beats.

So, I ask if anyone has an example configuration they can share, details of documentation that I might be able to use or advice on how to do this?

Thanks for any assistance.

Filbeat can only do simple filtering to exclude lines containing a particular keyword or match a regexp. It cannot parse logs and split them into fields.

For the moment you need Logstash for that. With the 5.0 release, though, you will be able to parse logs with Elasticsearch itself! See the docs here but note that this won't be release for another few months.

I'm not sure why you don't want to use Logstash, but my current setup is
IISlogs > nxlog > Logstash > Elasticsearch. My smallest ES cluster is receiving about 10GB of raw IIS logs per day. It's very simple to setup nxlog, and Logstash without too many filters won't be a bottleneck.

Thanks @tudor, this is amazing news... I'm assuming that ES5.0 wont have plugin support in the way Logstash currently has, and that it would probably be something like grok patterns.
Do you by chance know any more about this, or have a link to further details?
I clearly misunderstood the purpose of regxp, and I'll need to look into other options (Logstash being top of the list).

Thanks @anhlqn, I'm going to look into nxlog as a solution to the issue, but I have been hoping that I could pipe all my logs directly to my ES cluster. On a large scale, I feel that Logstash can be a bottleneck/SPOF (I do wonder if nxlog would introduce the same issue), and it important to monitor and know when LS is causing a bottleneck. I'd like to avoid that scenario.
I understand there are methods to cluster LS but I'm looking for simple design, so that it can scale easier.

Thanks both of you.

One last question to anyone who may know:
Is there anyway to use filebeat with the effect of pre-processing. For example, if filebeat log input is in properly-formatted JSON-output, would this assist in splitting?
My understanding is that this wouldn't require any processing, just an understanding of JSON.

Nxlog is very efficient and lightweight, so it should not cause bottleneck. About Logstash, without heavy filters, it can process 15K events/messages per second at default filter workers. What is your total daily IIS Log volume?

If you don't mind license fee, nxlog enterprise can export directly to ES with om_http module.

@plonka2000 Some more details on the ingest node can be found here: https://github.com/elastic/elasticsearch/issues/14049

Currently filebeat does not support JSON docs / structured logs but we are thinking about adding it: https://github.com/elastic/filebeat/issues/311

We have a similar setup as you. We did some testing a little while ago to see what kind of throughput we could get with NXLog sending to Logstash.
We had one instance of NXLog and one instance of Logstash. We then pulled a 3 million IIS log file off of our production servers and had NXLog start at the beginning. Logstash had no filters and output was set to NULL.
We were able to get just under 10k events per second. We then tried adding two machines with NXLog sending to the same Logstash instance. Speeds basically doubled. Unfortunately we didn't have any more free machines to see how much more our one Logstash instance could take. But it was handling 20k events per second fairly easily.
We also tried having Logstash run some Grok, Kv, and Mutate filters. Speeds dropped to around 7k events per second for the one NXLog instance.

So NXLog was pretty quick, but Logstash still seemed to be quicker. NXLog was also converting the data to JSON, its possible if that was removed it would go much faster.