Parse PostgreSQL CSV log


At work we are consuming the PostgreSQL CSV log instead of plain log. This poses some advantages, like a better formatted input file, stable format, and easy to turn it on.

To achieve it, we are using the filebeat processors, first decoding a multiline CSV then break each field.

Now I'm looking to incorporate this as a filebeat code. Is someone working on anything similar? Should I look at beats/filebeat/module and replicate this structure?

Thanks in advance.

The default module of filebeat for postgresql export only some fields from the stderr default log format.
You will need to map all csv fields of postgresql log format to the ESC schema :

field type ecs mapping example
log_time timestamp(3) with time zone postgresql.log.timestamp 2020-12-27 08:43:50.674 PST
user_name text postgres
database_name text postgresql.log.database postgres
process_id integer 9004
connection_from text ? ::1:53881
session_id text ? 5fe8b9c6.232c
session_line_num bigint ? 1
command_tag text ? authentication
session_start_time timestamp(3) with time zone ? 2020-12-27 08:43:50 PST
virtual_transaction_id text ? 9/57
transaction_id bigint ? 0
error_severity text log.level FATAL
sql_state_code text postgresql.log.error.code 28000
message text log.message no pg_hba.conf entry for host "::1", user "postgres", database "postgres", SSL off
detail text ?
hint text ?
internal_query text ?
internal_query_pos integer ?
context text ?
query text postgresql.log.query
query_pos integer ?
location text ?
application_name text ?

That's a starting point. The other one is the PostgreSQL documentation here:

Note in PostgreSQL 13 there is a new field, called backend_type.

Actually there are more fields than that, like duration and special formatted fields like automatic vacuums and checkpoints.

I'll glue this in a PR and gather distinct log lines to create tests cases.

You can write your own module for this and send it as a PR
Something like this may help
Note that when you write your own module in beat, you use mostly ingest processor of elasticsearch instead of beat processors
When the beat is runned for the first time, the ingest pipeline ($BEAT_HOME/modules/${MODULE_NAME}/${STREAM_TYPE}/ingest/pipeline.yml) is loaded into elasticsearch

description: Pipeline for parsing PostgreSQL CSV logs.
- set:
    field: event.ingested
    value: '{{_ingest.timestamp}}'
- csv:
    field: message
    separator: ","
    target_fields: ["postgresql.log.timestamp", "", "postgresql.log.database", "", "temp.connection_from", "temp.session_id", "temp.session_line_num", "temp.command_tag", "temp.session_start_time", "temp.virtual_transaction_id", "temp.transaction_id", "log.level", "postgresql.log.error.code", "log.message", "temp.detail", "temp.hint", "temp.internal_query", "temp.internal_query_pos", "temp.context", "postgresql.log.query", "temp.query_pos", "temp.location", "temp.application_name"]
    ignore_missing: true
    trim: true
- date:
    field: postgresql.log.timestamp
    target_field: '@timestamp'
    - yyyy-MM-dd HH:mm:ss.SSS zz
    - yyyy-MM-dd HH:mm:ss zz
- convert:
    field: postgresql.log.error.code
    type: integer
- script:
    lang: painless
    source: ctx.event.duration = Math.round(ctx.temp.duration * params.scale)
      scale: 1000000
    if: ctx.temp?.duration != null
- remove:
    field: temp.duration
    ignore_missing: true
- set:
    field: event.kind
    value: event
- append:
    field: event.category
      - database
- append:
    field: event.type
      - info
- append:
    field: event.type
      - error
    if: "ctx?.postgresql?.log?.error?.code != null && ctx.postgresql.log.error.code >= 02000"
- append:
    field: related.user
    value: "{{}}"
    if: "ctx?.user?.name != null"
- set:
    field: error.message
    value: '{{ _ingest.on_failure_message }}'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.