Multiple patterns for dissect processor in the pipeline

Hello,
We have logs with similar but different log formats and we would like to use the pipeline dissect processor with multiple patterns to dissect the different formats. We have already tried with grok but the ingestion is massive and the performance is horrible.

This would be the perfect solution but it doesn't exist:

{
  "dissect": {
    "if": "ctx.tags.contains('httpd') && ctx.tags.contains('isprime') && ctx.tags.contains('access')",
    "field": "message",
    "patterns": [
      "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} HTTP/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\" \"%{apache2.access.xff}\" \"%{apache2.access.protocol}\"",
      "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} HTTP/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\"",
      "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} RTSP/%{apache2.access.rtsp_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\" \"%{apache2.access.xff}\" \"%{apache2.access.protocol}\""
    ],  
    "ignore_missing" : true
  }
}

Does anybody know if there is a similar solution or a different one to solve this? Thanks

Mario

Hi @mmartinez

So no there is no multi-pattern today, perhaps you could open a feature request.

I think there are a couple approaches

  1. 3 Dissect processors in cascade / order with a simply if condition on the 2nd and 3rd that test for the the field exists from the previous. This will still be pretty efficient

  2. The first and 3rds patterns appear to be the same other that 1 field and 1 constant you certainly could use a single dissect then rename the field later.

  3. Combine 1 & 2

If I am seeing correctly 1st and 3rd could look like this
Then rename field based on value of %{apache2.protocol}

"%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} %{apache2.protocol}/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\" \"%{apache2.access.xff}\" \"%{apache2.access.protocol}\"

Thanks Stephen!

I will work on that and let you know if I success or not :slightly_smiling_face:

Hi @stephenb

At the end I did this but Im not sure Im following the correct order.

{
  "dissect": {
    "if": "ctx.tags.contains('httpd') && ctx.tags.contains('isprime') && ctx.tags.contains('access')",
    "field": "message",
    "pattern": "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} %{?apache2.access.http_todelete}/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\" \"%{apache2.access.xff}\" \"%{apache2.access.protocol}\"",  
    "ignore_missing" : true
  }
},
{
  "dissect": {
    "if": "ctx.tags.contains('httpd') && ctx.tags.contains('isprime') && ctx.tags.contains('access') && ! ctx.containsKey('apache2.access.protocol')",
    "field": "message",
    "pattern": "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.method} %{apache2.access.url} %{?apache2.access.http_todelete}/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\"",  
    "ignore_missing" : true
  }
},
{
  "dissect": {
    "if": "ctx.tags.contains('httpd') && ctx.tags.contains('isprime') && ctx.tags.contains('access') && ! ctx.containsKey('apache2.access.user_agent')",
    "field": "message",
    "pattern": "%{apache2.access.domain} %{apache2.access.vhost} \"%{apache2.access.filename}\" %{apache2.access.remote_ip} %{?auth} %{apache2.access.user} [%{apache2.access.timestamp}] \"%{apache2.access.url} HTTP/%{apache2.access.http_version}\" %{apache2.access.response_code} %{apache2.access.body_sent.bytes} \"%{apache2.access.referrer}\" \"%{apache2.access.user_agent}\" \"%{apache2.access.xff}\" \"%{apache2.access.protocol}\"",  
    "ignore_missing" : true
  }
},

You just need to look at your data and put the most common pattern first, then 2nd then 3rd

Little confused why you need all 3 with... 1 and 3 the same no?

%{?apache2.access.http_todelete}

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.