Pipelines - Losing values with Foreach and Grok


(Josiah Raiche) #1

Similar to this unanswered question: Ingest: foreach + grok processor - how to write result into an array?

I'm trying to parse a multiline document. Each line is a step in trace route and I'd like to process each line into an array.

Here's the relevant field, execbeat's exec.stdout for a windows tracert command:

Tracing route to example.com [123.456.789.10]
over a maximum of 30 hops:

  1    24 ms     1 ms     1 ms  123.456.789.14
  2     2 ms     1 ms     2 ms  example.com [123.456.789.12]
  3     5 ms     1 ms     1 ms  example.com [123.456.789.10]

Trace complete.

Desired output, though I'd also be happy with separate top level arrays for each key, or even joined strings.

{
...
"hops": [
    {"hop_step": 1, "hop_ip": "123.456.789.14", "rtt_1": 24, "rtt_2": 1},
    {"hop_step": 2, "hop_ip": "123.456.789.12", "rtt_1": 2, "rtt_2": 1}
[... up to 30] 
]
}

Here's my pipeline:

PUT _ingest/pipeline/tracert
{
  "description": "A pipeline for parsing tracert results",
  "processors": [
	{
		"set": {
			"field": "hops",
			"value": ""
		}
	},
	{
		"split": {
			"field": "exec.stdout",
			"separator": "[\r\n]+"
		}
	},
	{
		"foreach": {
			"field": "exec.stdout",
			"processor": {
				"grok": {
					"field": "_ingest._value",
					"patterns": ["%{POSINT:hop_step}\\s+%{POSINT:rtt_1} ms\\s+ %{POSINT:rtt_2} ms\\s+ %{POSINT:rtt_3} ms\\s+%{USERNAME:hop_server_name}? \\[?%{IP:hop_ip}"],
					"ignore_failure" : true
				}
			}
		}
	}
  ]
}

Grok works great and discards the non-matching lines, retaining the ones I like. But since foreach can only accept 1 processor, there's no way to capture those values, and they are overwritten by subsequent lines.

This seems like a common case; how can I do this?


(Jake Landis) #2

Unfortunately it is not possible with using just Foreach and Grok for the reasons you mention.

However, It is possible to do this is in Painless with the script processor.

You will have to enable regular expressions in you elasticsearch config : script.painless.regex.enabled: true and manually map between Grok and the underlying regex as defined here.

I took a quick swing, and hopefully this is a starting point to implement something like this (the example as-is is fragile and meant only as getting started example)

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "description": "test",
    "processors": [
      {
        "script": {
          "lang": "painless",
          "source": """
            String[] lines =  /[\r\n]+/.split(ctx.exec?.stdout);
            List hops = new ArrayList();
            for( String line : lines){
              Map hop = new HashMap();
              Matcher matcher = /^(\b(?:[1-9][0-9]*)\b)\s+(\b(?:[1-9][0-9]*)\b) ms\s+ (\b(?:[1-9][0-9]*)\b) ms\s+ (\b(?:[1-9][0-9]*)\b) ms\s+([a-zA-Z0-9._-]+)\s+\[(.*)\].*$/.matcher(line);
              if(matcher.matches()){
                if(matcher.groupCount() >= 1){
                  hop.put("hop", matcher.group(1));
                 }
                 if(matcher.groupCount() >= 2){
                  hop.put("rtt_1", matcher.group(2));
                 }
                  if(matcher.groupCount() >= 3){
                  hop.put("rtt_2", matcher.group(3));
                 }
                  if(matcher.groupCount() >= 4){
                  hop.put("rtt_3", matcher.group(4));
                 }
                 if(matcher.groupCount() >= 5){
                  hop.put("user_name", matcher.group(5));
                 }
                 if(matcher.groupCount() >= 6){
                  hop.put("ip", matcher.group(6));
                 }
                hops.add(hop);
               }
             }
            ctx.hops = hops;
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "test",
      "_type": "_doc",
      "_id": "99",
      "_source": {
        "exec": {
          "stdout": "Tracing route to example.com [123.456.789.10]\nover a maximum of 30 hops:\n1    24 ms     1 ms     1 ms  123.456.789.14\n2     2 ms     1 ms     2 ms  example.com [123.456.789.12]\n3     5 ms     1 ms     1 ms  example.com [123.456.789.10]"
        }
      }
    }
  ]
}

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.