Parse PostgreSQL CSV log

azlev · December 27, 2020, 2:45pm

Hello.

At work we are consuming the PostgreSQL CSV log instead of plain log. This poses some advantages, like a better formatted input file, stable format, and easy to turn it on.

To achieve it, we are using the filebeat processors, first decoding a multiline CSV then break each field.

Now I'm looking to incorporate this as a filebeat code. Is someone working on anything similar? Should I look at beats/filebeat/module and replicate this structure?

Thanks in advance.

ylasri · December 27, 2020, 4:57pm

The default module of filebeat for postgresql export only some fields from the stderr default log format.
You will need to map all csv fields of postgresql log format to the ESC schema :

field	type	ecs mapping	example
log_time	timestamp(3) with time zone	postgresql.log.timestamp	2020-12-27 08:43:50.674 PST
user_name	text	user.name	postgres
database_name	text	postgresql.log.database	postgres
process_id	integer	process.pid	9004
connection_from	text	?	::1:53881
session_id	text	?	5fe8b9c6.232c
session_line_num	bigint	?	1
command_tag	text	?	authentication
session_start_time	timestamp(3) with time zone	?	2020-12-27 08:43:50 PST
virtual_transaction_id	text	?	9/57
transaction_id	bigint	?	0
error_severity	text	log.level	FATAL
sql_state_code	text	postgresql.log.error.code	28000
message	text	log.message	no pg_hba.conf entry for host "::1", user "postgres", database "postgres", SSL off
detail	text	?
hint	text	?
internal_query	text	?
internal_query_pos	integer	?
context	text	?
query	text	postgresql.log.query
query_pos	integer	?
location	text	?
application_name	text	?

azlev · December 27, 2020, 5:51pm

That's a starting point. The other one is the PostgreSQL documentation here: https://www.postgresql.org/docs/current/runtime-config-logging.html

Note in PostgreSQL 13 there is a new field, called backend_type.

Actually there are more fields than that, like duration and special formatted fields like automatic vacuums and checkpoints.

I'll glue this in a PR and gather distinct log lines to create tests cases.

ylasri · December 27, 2020, 5:52pm

You can write your own module for this and send it as a PR
Something like this may help
Note that when you write your own module in beat, you use mostly ingest processor of elasticsearch instead of beat processors
When the beat is runned for the first time, the ingest pipeline ($BEAT_HOME/modules/${MODULE_NAME}/${STREAM_TYPE}/ingest/pipeline.yml) is loaded into elasticsearch

description: Pipeline for parsing PostgreSQL CSV logs.
processors:
- set:
    field: event.ingested
    value: '{{_ingest.timestamp}}'
- csv:
    field: message
    separator: ","
    target_fields: ["postgresql.log.timestamp", "user.name", "postgresql.log.database", "process.pid", "temp.connection_from", "temp.session_id", "temp.session_line_num", "temp.command_tag", "temp.session_start_time", "temp.virtual_transaction_id", "temp.transaction_id", "log.level", "postgresql.log.error.code", "log.message", "temp.detail", "temp.hint", "temp.internal_query", "temp.internal_query_pos", "temp.context", "postgresql.log.query", "temp.query_pos", "temp.location", "temp.application_name"]
    ignore_missing: true
    trim: true
- date:
    field: postgresql.log.timestamp
    target_field: '@timestamp'
    formats:
    - yyyy-MM-dd HH:mm:ss.SSS zz
    - yyyy-MM-dd HH:mm:ss zz
- convert:
    field: postgresql.log.error.code
    type: integer
- script:
    lang: painless
    source: ctx.event.duration = Math.round(ctx.temp.duration * params.scale)
    params:
      scale: 1000000
    if: ctx.temp?.duration != null
- remove:
    field: temp.duration
    ignore_missing: true
- set:
    field: event.kind
    value: event
- append:
    field: event.category
    value:
      - database
- append:
    field: event.type
    value:
      - info
- append:
    field: event.type
    value:
      - error
    if: "ctx?.postgresql?.log?.error?.code != null && ctx.postgresql.log.error.code >= 02000"
- append:
    field: related.user
    value: "{{user.name}}"
    if: "ctx?.user?.name != null"
on_failure:
- set:
    field: error.message
    value: '{{ _ingest.on_failure_message }}'

system · January 24, 2021, 5:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat doesn't parse postgresql csvlog files Beats filebeat	1	540	March 6, 2019
Filebeats with postgresql module, custom log_line_prefix Beats filebeat	4	1773	November 19, 2019
Filebeat timestamp processor handle timezone abbreviation incorrectly Beats	2	975	July 24, 2020
PostgreSQL Module: What is the "EXPECTED" format (I.E. postgresql.conf settings) Beats filebeat	3	323	November 28, 2019
Abbreviation CST timezone issue when use postgresql filebeat module Beats beats-module	2	318	June 9, 2023

Parse PostgreSQL CSV log

Related topics