Hi
I've cloned the Postgresql integration pipelines and changed its grok patterns to fit our custom postgresql log format.
Almost everything gets parsed correctly with the following grok pattern:
%{POSTGRESQL_REMOTE_HOST:postgresql.remote_host} %{POSTGRESQL_USER:user.name} %{POSTGRESQL_DB_NAME:postgresql.log.database} %{WORD}\[%{NUMBER:process.pid:long}\]: \[%{BASE16FLOAT:postgresql.log.session_line_number:long}\-%{INT}\]%{SPACE}%{WORD:log.level}: (?:%{POSTGRESQL_ERROR:postgresql.log.sql_state_code}|%{SPACE})(duration: %{NUMBER:temp.duration:float} ms %{POSTGRESQL_QUERY_STEP}: %{ULTRAGREEDYDATA:postgresql.log.query}|: %{ULTRAGREEDYDATA:tmp.message}|%{ULTRAGREEDYDATA:tmp.message})
However the greedydata-dumping into "postgresql.log.query" cuts off as soon as a newline character is processed. Here's an example log which contains the literal tabs and newlines:
2023-05-09 08:02:03 CEST [local] postgres postgres postgres[375010]: [9-1] LOG: duration: 0.182 ms statement: SELECT * FROM adis
WHERE navn='Adis Pinjic'
;
When I expand its indexed document in the Discover tab and select JSON-output the message field contains the following:
"message": "2023-05-09 08:02:03 CEST [local] postgres postgres postgres[375010]: [9-1] LOG: duration: 0.182 ms statement: SELECT * FROM adis\n\tWHERE navn='Adis Pinjic'\n\t;",
Here you can see that postgresql.log.query gets cut off (in the same document):
"postgresql": {
"log": {
"database": "postgres",
"query": "SELECT * FROM adis",
"session_line_number": 9,
"query_step": "statement",
"timestamp": "2023-05-09 08:02:03 CEST"
},
"remote_host": "[local]"
},
The use of the "ULTRAGREEDYDATA" custom pattern is just a desperate attempt to allow any type of newline and tab characters. It looks like this:
"pattern_definitions": {
"ULTRAGREEDYDATA": """(.|\t|\n|\r|\s)*""",
The "Custom Logs" integration I use to harvest the postgresql log file has the following multiline configuration in the integration's "Custom configuration":
multiline:
match: after
type: pattern
pattern: '^\d{4}-\d{2}-\d{2}'
negate: true
Have any of you solved a similar problem in your own clusters or have any pointers for how to debug this further? Our whole stack is running 8.6.2 (ES, Kibana, Logstash and Elastic Agents).
I've tried recreating the issue in the Grok Debugger, however the grok pattern has no problem parsing the \n
and \t
text from the message field ( SELECT * FROM adis\n\tWHERE navn='Adis Pinjic'\n\t;
). I suspect that I don't quite understand how the Elastic Agent actually feeds the multiline data to Elasticsearch.