Index a substring

I am indexing bibliographic data with Kibana and I want to index just a substring of a field.

The field has the following type of data and I would like to index just the data in bold, which is always in the same position, 9 to 12.

Here’s an example:

20010525d18811888m y0pory01030103ba

Is there any way of doing it?

Thank you for your help.

Hi Miguel, by default all nodes allow you to specify ingest pipelines on them, which allow you to define ways to process ingested data.

In your case, you can specify an ingest pipeline that uses a script processor to extract the substring and assign it as a field to the document. In the example below, this pipeline is called extract-substring.

PUT _ingest/pipeline/extract-substring
{
  "description" : "Extract a substring from a serial number",
  "processors" : [
    {
      "script": {
          "source": """
            ctx.extractedSubstring = ctx.sourceField.substring(9, 13)
          """
      }
    }
  ]
}

If you were to ingest a document and assign this pipeline like this:

PUT test/_doc/test-document?pipeline=extract-substring
{
  "sourceField": "20010525d18811888m y0pory01030103ba"
}

Then the indexed document will have this resulting shape:

{
  "sourceField" : "20010525d18811888m y0pory01030103ba",
  "extracted" : "1881"
}

Hi CJ,

Thank you very much for your answer and for the great explanation!

Best regards!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.