CombinedApacheLog pattern where request is without parameters

Hello i have set up a filebeat es and kibana stack but in the kibana/es i can see that the request contains parameters making it hard to collect all data on a specific url, i did find the URIPATH parameter but dont know if its possible to chain the combinedapachelog output to another grok pattern in ingest
any pointers?
regards
Ronald

You can add further grok patterns to the existing ones. Exmaple: my-new-pattern is added to existing patterns:

    "grok": {
      "field": "message",
      "patterns":[
        "%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{D
ATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",
        "%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -",
         "my-new-pattern"
        ],
    }

After adding it do not forget to update the template. By default FB does not update pipelines if it is changed. So you must do it manually.

Can you tell me how to do such an update? I'm stuck since some hours with a custom made Pattern whre i allways get an "Provided Grok expressions do not match field value" in kibana. But i tried the pattern in the grok test tool "http://grokconstructor.appspot.com/do/match#result" an there the patern works perfect there.

I usually use the Simulate API or the Grok Debugger in X-Pack Basic.
Simulate API: https://www.elastic.co/guide/en/elasticsearch/reference/master/simulate-pipeline-api.html
Grok debugger: https://www.elastic.co/guide/en/kibana/current/xpack-grokdebugger.html

A possible gotcha is that in pipeline.json files you must put escape backslashes whereas, it is not required in other Grok testers. So you could try changing every \ into \\.

If these does not help you, feel free to share a sample log line and your grok pattern. So we can work out a solution.

Hello Noémi, thanks for your Reply, I copied the Apache Filebeat Configuration ant tried to adapt to my Logfile (for testing purposes i took an exaple from somwhere to be shure the Pattern is correct).

This is my /usr/share/filebeat/module/perf/access/manifest.yml:

module_version: 1.0

var:
  - name: paths
    default:
      - /var/log/httpd/perf.log
      - /var/log/apache2/other_vhosts_access.log*
    os.darwin:
      - /usr/local/var/log/apache2/access_log*
    os.windows:
      - "C:/tools/Apache/httpd-2.*/Apache24/logs/access.log*"
      - "C:/Program Files/Apache Software Foundation/Apache2.*/logs/access.log*"

ingest_pipeline: ingest/pipeline.json
prospector: config/perf.yml

This is the /usr/share/filebeat/module/perf/access/ingest/pipeline.json (i double checked that this pipeline.json is loaded, i renamed it and i got the error the file is not found)

{
  "description": "Pipeline for parsing Apache2 performance logs. Requires the geoip and user_agent plugins.",
  "processors": [{
    "grok": {
      "field": "message",
      "patterns":[
          "%{TIMESTAMP_ISO8601:perf.access.timestamp} \\[%{IPV4:ip};%{WORD:perf.access.environment}\\] %{LOGLEVEL:perf.access.log_level} %{GREEDYDATA:perf.access.message}"
      ],
      "ignore_missing": false
    }
  },{
    "remove":{
      "field": "message"
    }
  }],
  "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

This is the Log File /var/log/httpd/perf.log (i filled by hand to test)

2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message
2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message

This is the Output when i try: filebeat -e -modules=perf -d "*"

2017/12/06 20:21:59.321210 prospector.go:350: DBG Check file for harvesting: /var/log/httpd/perf.log
2017/12/06 20:21:59.321224 prospector.go:436: DBG Update existing file for harvesting: /var/log/httpd/perf.log, offset: 71
2017/12/06 20:21:59.321231 prospector.go:488: DBG Harvester for file is still running: /var/log/httpd/perf.log
2017/12/06 20:21:59.321239 prospector.go:157: DBG Prospector states cleaned up. Before: 1, After: 1
2017/12/06 20:22:04.322709 log.go:85: DBG End of file reached: /var/log/httpd/perf.log; Backoff now.
2017/12/06 20:22:09.321364 prospector.go:140: DBG Run prospector
2017/12/06 20:22:09.321387 prospector.go:136: DBG Start next scan
2017/12/06 20:22:09.321431 prospector.go:350: DBG Check file for harvesting: /var/log/httpd/perf.log
2017/12/06 20:22:09.321443 prospector.go:436: DBG Update existing file for harvesting: /var/log/httpd/perf.log, offset: 71
2017/12/06 20:22:09.321449 prospector.go:488: DBG Harvester for file is still running: /var/log/httpd/perf.log
2017/12/06 20:22:09.321457 prospector.go:157: DBG Prospector states cleaned up. Before: 1, After: 1
2017/12/06 20:22:14.322957 processor.go:262: DBG Publish event: {
  "@timestamp": "2017-12-06T20:22:14.322Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.0.0",
    "pipeline": "filebeat-6.0.0-perf-access-pipeline"
  },
  "message": "2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message",
  "source": "/var/log/httpd/perf.log",
  "offset": 142,
  "fileset": {
    "module": "perf",
    "name": "access"
  },
  "prospector": {
    "type": "log"
  },
  "beat": {
    "name": "enac2-dev",
    "hostname": "enac2-dev",
    "version": "6.0.0"
  }
}
2017/12/06 20:22:14.323011 log.go:85: DBG End of file reached: /var/log/httpd/perf.log; Backoff now.
2017/12/06 20:22:15.323169 log.go:85: DBG End of file reached: /var/log/httpd/perf.log; Backoff now.
2017/12/06 20:22:15.326545 client.go:282: DBG PublishEvents: 1 events have been  published to elasticsearch in 3.352298ms.
2017/12/06 20:22:15.326578 logger.go:29: DBG ackloop: receive ack [1: 0, 1]
2017/12/06 20:22:15.326589 logger.go:29: DBG broker ACK events: count=1, start-seq=2, end-seq=2
2017/12/06 20:22:15.326597 logger.go:18: DBG ackloop: return ack to broker loop:1
2017/12/06 20:22:15.326603 logger.go:18: DBG ackloop:  done send ack
2017/12/06 20:22:15.326623 registrar.go:200: DBG Processing 1 events
2017/12/06 20:22:15.326633 registrar.go:195: DBG Registrar states cleaned up. Before: 2, After: 2
2017/12/06 20:22:15.326638 registrar.go:228: DBG Write registry file: /var/lib/filebeat/registry
2017/12/06 20:22:15.327260 registrar.go:253: DBG Registry file updated. 2 states written.
2017/12/06 20:22:17.323368 log.go:85: DBG End of file reached: /var/log/httpd/perf.log; Backoff now.
2017/12/06 20:22:19.310159 metrics.go:39: INFO Non-zero metrics in the last 30s: beat.memstats.gc_next=4194304 beat.memstats.memory_alloc=2487336 beat.memstats.memory_total=4269592 filebeat.events.added=5 filebeat.events.done=5 filebeat.harvester.open_files=1 filebeat.harvester.running=1 filebeat.harvester.started=1 libbeat.config.module.running=0 libbeat.config.reloads=1 libbeat.output.read.bytes=1611 libbeat.output.type=elasticsearch libbeat.output.write.bytes=1673 libbeat.pipeline.clients=3 libbeat.pipeline.events.active=0 libbeat.pipeline.events.filtered=3 libbeat.pipeline.events.published=2 libbeat.pipeline.events.retry=1 libbeat.pipeline.events.total=5 libbeat.pipeline.queue.acked=2 registrar.states.current=2 registrar.states.update=5 registrar.writes=5
2017/12/06 20:22:19.321564 prospector.go:140: DBG Run prospector

This is the json i see in KABANA

{
  "_index": "filebeat-6.0.0-2017.12.06",
  "_type": "doc",
  "_id": "b7WFLWAB_HTHudAEojbx",
  "_score": 1,
  "_source": {
    "@timestamp": "2017-12-06T20:30:20.674Z",
    "offset": 213,
    "beat": {
      "hostname": "enac2-dev",
      "name": "enac2-dev",
      "version": "6.0.0"
    },
    "prospector": {
      "type": "log"
    },
    "source": "/var/log/httpd/perf.log",
    "message": "2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message",
    "fileset": {
      "module": "perf",
      "name": "access"
    },
    "error": {
      "message": "Provided Grok expressions do not match field value: [2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message]"
    }
  },
  "fields": {
    "@timestamp": [
      "2017-12-06T20:30:20.674Z"
    ]
  }
}

For me it seems that the pattern is completly ignored but i cant figure out why. But i dont knw where to further investigate.

Thankful for any hint

Rolf

thanks for that insight i have now changed my pipeline to the following pattern:

   {
  "description" : "Ingest pipeline for Apache httpd Combined Log Format",
  "processors" : [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATH:requestpath}(?:%{URIPARAM:requestparameters})?(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)"]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [ "dd/MMM/YYYY:HH:mm:ss Z" ]
      }
    },
    {
      "geoip": {
        "field": "client_ip"
      }
    },
    {
      "user_agent": {
        "field": "agent"
      }
    }
  ]
}

when i run this pattern in the grok debugger i see that my request is indeed split up, yet in the final elasticsearch i do not see them again, is there perhaps a setting that helps me see what message is finally digested by elasticsearch?
Regards
Ronald

You should try \\[%{HTTPDATE:timestamp}\\] instead of \[%{HTTPDATE:timestamp}\].

thanks updated my pattern to double escape the slashes, when i run it in the _ingest/simulate it works but after running filebeat the timestamp is set to the moment i ran the filebeat command see my filebeat.ymlpastebin -> filebeat.yml and the result from the simulate commandsimulate result, so how can i inform es to use correct timestamp? or alternatively does someone have a good example to use filebeat to parse apache logs where the request is split up to have seperate entry for request with and without parameters
thanks
Ronald

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.