Reindex "host" to "host.ip"

Hello, we recently moved our Logstash that was receiving SNMP to the data stream. All of the current data has worked fine, but when I went to reindex the past several months it failed due to the old files had placed the IP address in "host" rather than having that as an object and the IP in "host.ip"

POST /_reindex?pretty
{
  "source": {
    "index": "snmp-logstash-2022.05.06"
  },
  "dest": {
    "index": "metrics-logs-snmp-logstash-datastream",
    "op_type": "create"
  }
}
    "failures" : [
      {
        "index" : ".ds-metrics-logs-snmp-logstash-datastream-2022.05.06-000001",
        "id" : "sVSuloAB0t9pHq-JN6hm",
        "cause" : {
          "type" : "mapper_parsing_exception",
          "reason" : "object mapping for [host] tried to parse field [host] as object, but found a concrete value"
        },
        "status" : 400
      },
      ...

Those indexes no longer have incoming data, and we would like to delete after moving to the data stream to get it off of our hot servers. Is there a way to move the data to a new variable "host.ip" and change the "host" into an object rather than text field before reindexing them into the data steam?

Yup you need to reindex with an ingest pipeline
Your ingest pipeline will do the transforms
See Here for reindex with pipeline

You ingest pipeline can be pretty simple using a set, rename and / or perhaps a drop processor etc.
See processors here

1 Like

I created a similar script to the the example on the reindex page, but since the one variable is a parent.child it is giving errors.

POST /_reindex?pretty
{
  "source": {
    "index": "snmp-logstash-2022.05.06"
  },
  "dest": {
    "index": "metrics-logs-snmp-logstash-datastream",
    "op_type": "create"
  },
  "script": {
    "source": "ctx._source.host.ip =ctx._source.remove(\"host\")"
  }
}

Result:

"script_stack" : [
          "ctx._source.host.ip =ctx._source.remove(\"host\")",
          "                ^---- HERE"
        ],

When I place the variable as ("host.ip") it has similar results except at the parenthesize. Is there a way to group that variable so that it recognizes the period as part of the name rather than making it part of the command? That command also would rename the variable before removing "host" so there would be no loss of data too correct?

Here is an ingest pipeline that should work

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "field": "host_ip",
          "value": "{{host}}"
        }
      },
      {
        "remove": {
          "field": "host"
        }
      },
      {
        "rename": {
          "field": "host_ip",
          "target_field": "host.ip"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "host": "192.169.0.1"
      }
    }
  ]
}

Result

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_id" : "_id",
        "_source" : {
          "host" : {
            "ip" : "192.169.0.1"
          }
        },
        "_ingest" : {
          "timestamp" : "2022-05-19T16:59:17.273241836Z"
        }
      }
    }
  ]
}

Thank you for showing me the ingest pipeline you meant, everything is working great now!

One more question; almost everything moved to the data stream, and the host.ip issue worked like you said. There were 5 indices that did not work though claiming that there is a "version conflict."

    "failures" : [
      {
        "index" : ".ds-metrics-logs-snmp-logstash-datastream-2022.05.06-000001",
        "id" : "5nMQgoAB5Iam7sEGpX0G",
        "cause" : {
          "type" : "version_conflict_engine_exception",
          "reason" : "[5nMQgoAB5Iam7sEGpX0G]: version conflict, document already exists (current version [1])",
          "index_uuid" : "JAdCuPPiRpOb6m8zi5vnbw",
          "shard" : "0",
          "index" : ".ds-metrics-logs-snmp-logstash-datastream-2022.05.06-000001"
        },
        "status" : 409
      },
	  ...
	]

Is there a way to tell it to ignore the documents that already exist and still move the rest?

Never mind, I found that by adding a "conflict" option it continued processing the data to data stream (for others that may read this):

POST /_reindex?pretty
{
  "conflicts": "proceed",
  "source": {
    "index": "snmp-logstash-2022.05.01"
  },
  "dest": {
    "index": "metrics-logs-snmp-logstash-datastream",
    "pipeline": "convert_hostip",
    "op_type": "create"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.