Ingesting multiline fields with the Grok processor

adis · May 9, 2023, 6:54am

Hi

I've cloned the Postgresql integration pipelines and changed its grok patterns to fit our custom postgresql log format.

Almost everything gets parsed correctly with the following grok pattern:

%{POSTGRESQL_REMOTE_HOST:postgresql.remote_host} %{POSTGRESQL_USER:user.name} %{POSTGRESQL_DB_NAME:postgresql.log.database} %{WORD}\[%{NUMBER:process.pid:long}\]: \[%{BASE16FLOAT:postgresql.log.session_line_number:long}\-%{INT}\]%{SPACE}%{WORD:log.level}:  (?:%{POSTGRESQL_ERROR:postgresql.log.sql_state_code}|%{SPACE})(duration: %{NUMBER:temp.duration:float} ms  %{POSTGRESQL_QUERY_STEP}: %{ULTRAGREEDYDATA:postgresql.log.query}|: %{ULTRAGREEDYDATA:tmp.message}|%{ULTRAGREEDYDATA:tmp.message})

However the greedydata-dumping into "postgresql.log.query" cuts off as soon as a newline character is processed. Here's an example log which contains the literal tabs and newlines:

2023-05-09 08:02:03 CEST [local] postgres postgres postgres[375010]: [9-1]   LOG:  duration: 0.182 ms  statement: SELECT * FROM adis
        WHERE navn='Adis Pinjic'
        ;

When I expand its indexed document in the Discover tab and select JSON-output the message field contains the following:

 "message": "2023-05-09 08:02:03 CEST [local] postgres postgres postgres[375010]: [9-1]   LOG:  duration: 0.182 ms  statement: SELECT * FROM adis\n\tWHERE navn='Adis Pinjic'\n\t;",

Here you can see that postgresql.log.query gets cut off (in the same document):

    "postgresql": {
      "log": {
        "database": "postgres",
        "query": "SELECT * FROM adis",
        "session_line_number": 9,
        "query_step": "statement",
        "timestamp": "2023-05-09 08:02:03 CEST"
      },
      "remote_host": "[local]"
    },

The use of the "ULTRAGREEDYDATA" custom pattern is just a desperate attempt to allow any type of newline and tab characters. It looks like this:

         "pattern_definitions": {
            "ULTRAGREEDYDATA": """(.|\t|\n|\r|\s)*""",

The "Custom Logs" integration I use to harvest the postgresql log file has the following multiline configuration in the integration's "Custom configuration":

multiline:
   match: after
   type: pattern
   pattern: '^\d{4}-\d{2}-\d{2}'
   negate: true

Have any of you solved a similar problem in your own clusters or have any pointers for how to debug this further? Our whole stack is running 8.6.2 (ES, Kibana, Logstash and Elastic Agents).

I've tried recreating the issue in the Grok Debugger, however the grok pattern has no problem parsing the \n and \t text from the message field ( SELECT * FROM adis\n\tWHERE navn='Adis Pinjic'\n\t;). I suspect that I don't quite understand how the Elastic Agent actually feeds the multiline data to Elasticsearch.

system · June 6, 2023, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest Pipeline for parsing multiline fields giving provided Grok expressions do not match field value error error Elasticsearch	2	295	August 6, 2023
Multiline and Ingest Node Elasticsearch	7	4258	July 5, 2017
Specific GROK filter for multi-line Postgresql log Logstash	13	5162	July 6, 2017
Can't grok multiline logs Logstash	2	719	December 4, 2017
Fields are merge grok output Logstash	9	504	June 21, 2020

Ingesting multiline fields with the Grok processor

Related topics