Ingestion pipeline dynamically creates empty object upon nested JSON

Hello guys!

I've been trying to ingest documents through the ingestion pipeline provided by elastic. The idea is quite simple, transform the original JSON into one that has well formatted field names that at some point will make my search queries look clean and pretty.

My original object is this one:

{
  "TXM_Envelope": {
    "TXM_Header": "my_header"
  }
}

after the transformation, I would like to see something like this:

{
   "envelope":{
      "header":"my_header"
   }
}

Which is something that I have accomplished by adding this Rename processor to the ingestion pipeline (this is the only processor that I have):

[
  {
    "rename": {
      "field": "TXM_Envelope.TXM_Header",
      "target_field": "envelope.header",
      "ignore_missing": true
    }
  }
]

But along with the transformed JSON, an empty object with the original name (TXM_Envelope) has been created and added to my index _source field:

{
   "_index":"test-messages",
   "_id":"D5bQ4oQBxa54t0YLYWH0",
   "_score":1,
   "_source":{
      "TXM_Envelope":{
      },
      "envelope":{
         "header":"my_header"
      }
   }
}

The only way that I could remove the original root field was by adding an additional Remove processor to the pipeline:

[
   {
      "rename":{
         "field":"TXM_Envelope.TXM_Header",
         "target_field":"envelope.header",
         "ignore_missing":true
      }
   },
   {
      "remove":{
         "field":"TXM_Envelope"
      }
   }
]

So, I would like to know there's a way to prevent the empty original field to be indexed at all, and if you guys have any idea on why I've been having this behavior.

Obs: These are my index mapping and settings:

{
    "settings": {
      "index": {
            "default_pipeline": "test-messages-pipeline"
        }
    },
    "mappings": {
      "properties": {
        "envelope": {
          "properties": {
            "header": {
              "type": "keyword"
            }
          }
        }
      }
    }
}

Thank you all!

Best regards,

Douglas Korgut

Hi Douglas,

I have tried to simulate the same and found that we have to first rename outer object first and the later the inner field to avoid the issue you are facing.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "field": "TXM_Envelope",
          "target_field": "envelope",
          "ignore_missing": true
        }
      },
      {
        "rename": {
          "field": "envelope.TXM_Header",
          "target_field": "envelope.header",
          "ignore_missing": true
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "TXM_Envelope": {
          "TXM_Header": "my_header"
        }
      }
    }
  ]
}

Here is the result.

 "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "envelope": {
            "header": "my_header"
          }
        },
        "_ingest": {
          "timestamp": "2022-12-06T05:02:51.095224802Z"
        }
      }
    }
  ]
1 Like

Oh, that makes a lot a sense. Just tried your approach and I got the expected results in here. No empty objects indexed to my _source document. Thanks a lot @Venkata_Raja!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.