Gsub processor in ingest pipeline cannot substitute a pattern into newline

andreatera · January 13, 2023, 3:52pm

Hello there,
since we are moving from logstash pipeline into ingest pipeline we had to rewrite our gsub pattern from logstash into ingest processors.
Our use case is simple, replace the unicode \u2028 into a newline character (\n), in order to correctly show the content of multiple line message (for example stacktraces).

In particular before we had:

      # Replace the unicode \u2028 with \n, which Kibana will display as a new line (we write \u2028 that we recive a full logevent instead of \n)
      mutate {
        gsub => [ "message", '\u2028', "
" # Seems that passing a string with an actual newline in it is the only way to make gsub work.
        ]
      }

And now we are defining it using ingest pipeline -> gsub processor:

"gsub": {
		"description": "Replace the unicode \u2028 to return character",
		"field": "message",
		"target_field": "message",
		"pattern": "\\u2028",
		"replacement": "\\n"
	  }

Unfortunately this populate the message with string "\n" in the message, couldn't find a way to add newline character that works out.

How can I solve it?

Thanks in advance

stephenb · January 13, 2023, 4:11pm

Hi @andreatera I tried this

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "gsub": {
          "description": "Replace the unicode \u2028 to return character",
          "field": "message",
          "target_field": "message",
          "pattern": "\u2028",
          "replacement": "\\\n"        
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Test Message with a \u2028 here"
      }
    }
  ]
}

and got this

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": """Test Message with a 
 here"""
        },
        "_ingest": {
          "timestamp": "2023-01-13T16:10:28.321783962Z"
        }
      }
    }
  ]
}

Perhaps that will help / work?

I just ingested a document (i.e. not simulate) and it seems to work...

PUT _ingest/pipeline/discuss-test
{
  "processors": [
    {
      "gsub": {
        "description": "Replace the unicode \u2028 to return character",
        "field": "message",
        "target_field": "message",
        "pattern": "\u2028",
        "replacement": "\\\n"
      }
    }
  ]
}



POST discuss-test/_doc/?pipeline=discuss-test
{
  "message": "Test Message with a \u2028 here"
}

GET discuss-test/_search

#Results
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "CRrqq4UBurnLzVTzm9Nd",
        "_score": 1,
        "_source": {
          "message": """Test Message with a 
 here"""
        }
      }
    ]
  }
}

Ayush_Mathur · January 13, 2023, 4:14pm

Hello @andreatera when you escape newline character like \\n, it is treated as a string, that's why you see \n in your logs. Have you tried using script pipeline using Java to add newline character?
See: Script processor | Elasticsearch Guide [master] | Elastic

andreatera · January 16, 2023, 9:01am

good morning @Ayush_Mathur and thanks for the quick reply.
Unfortunately I couldn't find an example how to set up a Script processor (java language) in Ingest pipelines.
I tried like this

but can't be compiled.

andreatera · January 16, 2023, 9:39am

thanks @stephenb if I PUT following processor definition it works fine!

	{
	  "gsub": {
		"description": "Replace the unicode \\u2028 with \\n",
		"field": "message",
		"target_field": "message",
		"pattern": "\u2028",
		"replacement": "\\\n"
	  }
	},

system · February 13, 2023, 9:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to replace multiple new lines with one in Ingest Pipeline gsub Elasticsearch ingest-pipeline	4	443	January 26, 2024
How to replace multiple newlines with two newlines in ingest pipeline gsub Elasticsearch ingest-pipeline	7	287	February 6, 2024
How to replace unicode \u00a0 with space with ingest pipeline processor Elasticsearch ingest-pipeline	2	366	February 3, 2024
Elasticsearch Ingest node gsub processor replace character Elasticsearch	3	2379	February 14, 2018
New line not parsed in grok processor Elasticsearch ingest-pipeline	3	267	June 7, 2022

Gsub processor in ingest pipeline cannot substitute a pattern into newline

Related topics