Multiple grok rules in ingest node processor definition

Hey, Elastic pioneer here, testing 5.0alpha1 :slight_smile:

In the ingest node, I was able to define a grok processor, but some log types need multiple grok rules (e.g. firewall logs). In Logstash, you'd simply specify an array of match definitions. If one matches, it the line is parsed, for example:

   grok {
     match => [
      "cisco_message", "%{CISCOFW106001}",
      "cisco_message", "%{CISCOFW106006_106007_106010}",
      "cisco_message", "%{CISCOFW106014}"
   ...

For the ingest node, if a grok processor fails, it throws and exception. So the only way I could find to add in multiple rules is to chain multiple such processors via on_failure. If there are multiple rules, the pipeline definition gets pretty hairy:

        "grok": {
          "field": "cisco_message",
          "pattern": "%{CISCOFW106001}",
          "on_failure": [
            {
              "grok": {
                "field": "cisco_message",
                "pattern": "%{CISCOFW106006_106007_106010}",
                "on_failure": [
                  {
                    "grok": {
                      "field": "cisco_message",
                      "pattern": "%{CISCOFW106014}"
                      ...

Would a pattern array make sense, to emulate Logstash's behavior? Should I open an issue on GitHub or is there a better way to handle this already?

Also, I'm not sure how Logstash implements this but performance degrades quite nicely with multiple rules. In this particular case, I've seen 1.5x slower throughput with 23 rules compared to one rule. With the ingest node and the on_failure approach described here, I'm getting 9x slower throughput with 23 rules. That said, Ingest node is faster in both cases, so maybe Logstash behaves better proportionally because it's heavier to begin with.

Best regards,
Radu

3 Likes

Thank you @radu_gheorghe for the feature suggestion!

this would definitely help out in the many cases that you want to be more flexible in which
grok patterns you match. Especially when the input can have many little variations.

I went ahead and created an issue to track this feature's development: https://github.com/elastic/elasticsearch/issues/17903

By the way, since you are one of the first users of Ingest for ES, how has your experience with the API been?
are there any tools you wish you had that would make it easier to manage? any feedback would be greatly appreciated!

Hi Tal,

Thanks a lot for the quick reply and the issue!

As for your questions, I like the Simulate Pipeline a lot (and I can easily see a UI built on top of it, like those online grok debuggers) and I like how you can specify, per document, the pipeline you want to use. This allows you to have multiple pipelines for multiple types of logs, without influencing each other. But enough "teaching", you know this stuff already :smiley: let me move on to the wish:

I think it would be cool to be able to set a default pipeline (one or more of them) for a specific index (in the index settings/template), so that all logs going to that index will be processed by it/them. This way, all tools from the ecosystem will benefit from Ingest without having to be aware of it or maintain configuration (all management will be on the ES side). Like now I couldn't find how to make Filebeat 5.0alpha1 use it, so I had to resort to using curl with "handmade" bulks.

Thanks again!
Radu

Hi Radu,

Thanks for being an Elastic pioneer :slight_smile:

We thought about this too during development and it is likely that we will add it in the future. In order to make filebeat work with ingest you configure the elasticsearch output to send parameters to ES. Configuring the pipeline parameter can be done like this :

output:
  elasticsearch:
    hosts: ["localhost:9200"]
    parameters:
      pipeline: "1"

Thanks, Martijn! I didn't know of the pipeline parameter. I looked for it in the config and in the docs and couldn't find it. I suppose it will be added for 5.0GA, right?

Not sure, but this was the PR that added this:

It is part of the example config. Also the elasticsearch docs mention this:
https://www.elastic.co/guide/en/beats/filebeat/master/elasticsearch-output.html#_parameters

Just not in the context of ingest.

Thanks, Martijn! I didn't notice that (very powerful!) option.