Multiple grok rules in ingest node processor definition

radu_gheorghe · April 20, 2016, 11:49am

Hey, Elastic pioneer here, testing 5.0alpha1

In the ingest node, I was able to define a grok processor, but some log types need multiple grok rules (e.g. firewall logs). In Logstash, you'd simply specify an array of match definitions. If one matches, it the line is parsed, for example:

   grok {
     match => [
      "cisco_message", "%{CISCOFW106001}",
      "cisco_message", "%{CISCOFW106006_106007_106010}",
      "cisco_message", "%{CISCOFW106014}"
   ...

For the ingest node, if a grok processor fails, it throws and exception. So the only way I could find to add in multiple rules is to chain multiple such processors via on_failure. If there are multiple rules, the pipeline definition gets pretty hairy:

        "grok": {
          "field": "cisco_message",
          "pattern": "%{CISCOFW106001}",
          "on_failure": [
            {
              "grok": {
                "field": "cisco_message",
                "pattern": "%{CISCOFW106006_106007_106010}",
                "on_failure": [
                  {
                    "grok": {
                      "field": "cisco_message",
                      "pattern": "%{CISCOFW106014}"
                      ...

Would a pattern array make sense, to emulate Logstash's behavior? Should I open an issue on GitHub or is there a better way to handle this already?

Also, I'm not sure how Logstash implements this but performance degrades quite nicely with multiple rules. In this particular case, I've seen 1.5x slower throughput with 23 rules compared to one rule. With the ingest node and the on_failure approach described here, I'm getting 9x slower throughput with 23 rules. That said, Ingest node is faster in both cases, so maybe Logstash behaves better proportionally because it's heavier to begin with.

Best regards,
Radu

talevy · April 20, 2016, 10:14pm

Thank you @radu_gheorghe for the feature suggestion!

this would definitely help out in the many cases that you want to be more flexible in which
grok patterns you match. Especially when the input can have many little variations.

I went ahead and created an issue to track this feature's development: https://github.com/elastic/elasticsearch/issues/17903

By the way, since you are one of the first users of Ingest for ES, how has your experience with the API been?
are there any tools you wish you had that would make it easier to manage? any feedback would be greatly appreciated!

radu_gheorghe · April 21, 2016, 4:39am

Hi Tal,

Thanks a lot for the quick reply and the issue!

As for your questions, I like the Simulate Pipeline a lot (and I can easily see a UI built on top of it, like those online grok debuggers) and I like how you can specify, per document, the pipeline you want to use. This allows you to have multiple pipelines for multiple types of logs, without influencing each other. But enough "teaching", you know this stuff already let me move on to the wish:

I think it would be cool to be able to set a default pipeline (one or more of them) for a specific index (in the index settings/template), so that all logs going to that index will be processed by it/them. This way, all tools from the ecosystem will benefit from Ingest without having to be aware of it or maintain configuration (all management will be on the ES side). Like now I couldn't find how to make Filebeat 5.0alpha1 use it, so I had to resort to using curl with "handmade" bulks.

Thanks again!
Radu

mvg · April 21, 2016, 6:46am

Hi Radu,

Thanks for being an Elastic pioneer

We thought about this too during development and it is likely that we will add it in the future. In order to make filebeat work with ingest you configure the elasticsearch output to send parameters to ES. Configuring the pipeline parameter can be done like this :

output:
  elasticsearch:
    hosts: ["localhost:9200"]
    parameters:
      pipeline: "1"

radu_gheorghe · April 21, 2016, 7:56am

Thanks, Martijn! I didn't know of the pipeline parameter. I looked for it in the config and in the docs and couldn't find it. I suppose it will be added for 5.0GA, right?

mvg · April 21, 2016, 9:27am

Not sure, but this was the PR that added this:

It is part of the example config. Also the elasticsearch docs mention this:
https://www.elastic.co/guide/en/beats/filebeat/master/elasticsearch-output.html#_parameters

Just not in the context of ingest.

radu_gheorghe · April 21, 2016, 12:54pm

Thanks, Martijn! I didn't notice that (very powerful!) option.

Topic		Replies	Views
Elastic Ingest with multiple grok processors Elasticsearch	7	4727	January 31, 2017
How come my ElasticSearch ingest pipelines with multiple grok patterns are all failing? Elasticsearch	1	725	March 13, 2018
How to use more than 1 pattern in pipelines? Elasticsearch	2	445	February 21, 2017
Unexpected result for grok processor with multiple patterns Elasticsearch	1	603	December 23, 2016
Hello elastic - Multiple Pipelines Elasticsearch docker , ingest-pipeline	10	658	March 30, 2022

Multiple grok rules in ingest node processor definition

Related topics