How to replace multiple newlines with two newlines in ingest pipeline gsub

I want to replace multiple (more than 3) newlines (\n\n\n) with two newlines (\n\n). If I set "\n\n" as a replacement string the the gsub object it replaces \n\n\n\ with nn.

Here you can find my _simulate ingest pipeline.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "gsub": {
          "field": "message",
          "pattern": """Page \d of \d""",
          "replacement": "",
          "ignore_missing": false,
          "description": "Remove Page x of x",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'remove_page_numbers' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      },
            {
        "gsub": {
          "field": "message",
          "pattern": "\\n\\n",
          "replacement": "\\n",
          "ignore_missing": false,
          "description": "Replace multiple newlines at the beginning",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'remove_page_numbers' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": """



Marketing Intern
May 2009 - August 2009 (4 months)
New York City Metropolitan Area

Jump PR
Public Relations Intern

  Page 2 of 3



   

May 2008 - August 2008 (4 months)
New York City Metropolitan Area

Education
University
Bachelor of Arts - BA, Communication and Media Studies · (August 2010 - May
2014)

Senior High School
 · (September 2006 - June 2010)

  Page 3 of 3"""
      }
    }
  ]
}


How does the replacement string have to look like that it that it set \n\n?

Thanks for you help.

Hi @Bowfish

I thought we answered this here

Please Do not Open multiple topics with the same question.

Is there something different here

@stephenb There is indeed something different with this post. The question here is how the string of gsub.replacement has to look like that it replaces the the string from the gsub.patter with \n .

If i run this ingest pipeline simulate

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "gsub": {
          "description": "Remove unicode no break space",
          "field": "message",
          "ignore_missing": true,
          "pattern": "  ",
          "replacement": "",
          "tag": "replace_unicode_bullets",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'replace_unicode_bullets' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      },
      {
        "gsub": {
          "description": "Remove Page x of y",
          "field": "message",
          "ignore_missing": true,
          "pattern": "Page \\d of \\d",
          "replacement": "",
          "tag": "replace_unicode_bullets",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'replace_unicode_bullets' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      },
      {
        "gsub": {
          "field": "message",
          "pattern": """(\\n){3,}""",
          "replacement": """\\n\\n""",
          "ignore_missing": false,
          "description": "Replace multiple newlines",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'replace_multiple_newlines' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      },
      {
        "gsub": {
          "description": "Replace unicode bulltest with dashes",
          "field": "message",
          "ignore_missing": true,
          "pattern": "•",
          "replacement": "-",
          "tag": "replace_unicode_bullets",
          "on_failure": [
            {
              "append": {
                "description": "Record error information",
                "field": "_ingestion_errors",
                "value": "Processor 'gsub' with tag 'replace_unicode_bullets' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
              }
            }
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": """
•Coordinated Scholarship event at Island\n\nEducation\n\n  Page 2 of 3\n\n\n\n   \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)  · (2010 - 2012)\n\n  Page 3 of 3"""
      }
    }
  ]
}

it replaces \n\n\n with \n, which is correct.

But if I save the same gsub like this:

"gsub": {
  "field": "message",
  "pattern": "(\\n){3,}",
  "replacement": "\\n\\n",
  "ignore_missing": false,
  "description": "Replace multiple newlines",
  "on_failure": [
    {
      "append": {
        "description": "Record error information",
        "field": "_ingestion_errors",
        "value": "Processor 'gsub' with tag 'replace_multiple_newlines' in pipeline '{{ _ingest.on_failure_pipeline }}' failed with message '{{ _ingest.on_failure_message }}'"
      }
    }
  ]
}

in an ingest pipeline and use this ingest pipline when uploading documents, the pattern is replaced with nand not \n . How does the replacement string have to look like that \n is inserted?

Thanks for your help.

Sorry I'm not sure I am following. Are you saying the simulate works but when you actually PUT the pipeline in an ingest documents it does not work?

Is that correct?

Side note, probably could have kept this in the same thread

Yes. That is correct.

Thanks for the note. Next time I will keep related questions in the same thread.

Can you show the exact input and output you desire?

Also I am confused...

You can PUT exactly what you _simulate with ... why are you changing the syntax?

From this _simulate

          "field": "message",
          "pattern": """(\\n){3,}""",
          "replacement": """\\n\\n""",

to this PUT

  "pattern": "(\\n){3,}",
  "replacement": "\\n\\n",
  "ignore_missing": false,

Why are you doing that?

When i PUT the exact syntax in your simulate I get the exact same output from a simulate as I do when I actually post a document with the pipeline

_simulate

          "message": """
-Coordinated Scholarship event at Island\n\nEducation\n\n \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)· (2010 - 2012)\n\n"""
        },

POST and _search

POST discuss-test/_doc/?pipeline=discuss-test
  {
        "message": """
•Coordinated Scholarship event at Island\n\nEducation\n\n  Page 2 of 3\n\n\n\n   \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)  · (2010 - 2012)\n\n  Page 3 of 3"""
      }

GET discuss-test/_search

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "3Mbt7owByr6kya5vpOvZ",
        "_score": 1,
        "_ignored": [
          "message.keyword"
        ],
        "_source": {
          "message": """
-Coordinated Scholarship event at Island\n\nEducation\n\n \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)· (2010 - 2012)\n\n"""
        }
      }
    ]
  }
}

Here they are side by side

_simlutate

-Coordinated Scholarship event at Island\n\nEducation\n\n \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)· (2010 - 2012)\n\n"""

POST and _search

-Coordinated Scholarship event at Island\n\nEducation\n\n \nCollege (MA)\nBachelor's degree, English Studies and Journalism · (2010 - 2013)\n\nNew York University\nContinuing Studies, Graphic Design · (2014 - 2016)\n\nUniversity de Córdoba\nHispanic and Englis Languages, Literatures, and Linguistics,\nGeneral · (2012 - 2012)\n\nMiami College\nAssociate of Arts (A.A.), Mass Communication/Media Studies · (2008 - 2010)\n\nWheaton College (MA)\nBachelor of Arts (B.A.)· (2010 - 2012)\n\n

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.