Gsub processor in ingest pipeline cannot substitute a pattern into newline

Hello there,
since we are moving from logstash pipeline into ingest pipeline we had to rewrite our gsub pattern from logstash into ingest processors.
Our use case is simple, replace the unicode \u2028 into a newline character (\n), in order to correctly show the content of multiple line message (for example stacktraces).

In particular before we had:

      # Replace the unicode \u2028 with \n, which Kibana will display as a new line (we write \u2028 that we recive a full logevent instead of \n)
      mutate {
        gsub => [ "message", '\u2028', "
" # Seems that passing a string with an actual newline in it is the only way to make gsub work.
        ]
      }

And now we are defining it using ingest pipeline -> gsub processor:

"gsub": {
		"description": "Replace the unicode \u2028 to return character",
		"field": "message",
		"target_field": "message",
		"pattern": "\\u2028",
		"replacement": "\\n"
	  }

Unfortunately this populate the message with string "\n" in the message, couldn't find a way to add newline character that works out.

How can I solve it?

Thanks in advance

Hi @andreatera I tried this

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "gsub": {
          "description": "Replace the unicode \u2028 to return character",
          "field": "message",
          "target_field": "message",
          "pattern": "\u2028",
          "replacement": "\\\n"        
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Test Message with a \u2028 here"
      }
    }
  ]
}


and got this

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": """Test Message with a 
 here"""
        },
        "_ingest": {
          "timestamp": "2023-01-13T16:10:28.321783962Z"
        }
      }
    }
  ]
}

Perhaps that will help / work?

I just ingested a document (i.e. not simulate) and it seems to work...

PUT _ingest/pipeline/discuss-test
{
  "processors": [
    {
      "gsub": {
        "description": "Replace the unicode \u2028 to return character",
        "field": "message",
        "target_field": "message",
        "pattern": "\u2028",
        "replacement": "\\\n"
      }
    }
  ]
}



POST discuss-test/_doc/?pipeline=discuss-test
{
  "message": "Test Message with a \u2028 here"
}

GET discuss-test/_search

#Results
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "CRrqq4UBurnLzVTzm9Nd",
        "_score": 1,
        "_source": {
          "message": """Test Message with a 
 here"""
        }
      }
    ]
  }
}
1 Like

Hello @andreatera when you escape newline character like \\n, it is treated as a string, that's why you see \n in your logs. Have you tried using script pipeline using Java to add newline character?
See: Script processor | Elasticsearch Guide [master] | Elastic

good morning @Ayush_Mathur and thanks for the quick reply.
Unfortunately I couldn't find an example how to set up a Script processor (java language) in Ingest pipelines.
I tried like this


but can't be compiled.

thanks @stephenb if I PUT following processor definition it works fine!

	{
	  "gsub": {
		"description": "Replace the unicode \\u2028 with \\n",
		"field": "message",
		"target_field": "message",
		"pattern": "\u2028",
		"replacement": "\\\n"
	  }
	},

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.