Is it possible to ignore failure while using the Reindex API?

Hello,

I'm trying to run some reindex on an index and got a failure related to a mapping parsing exception, which is kinda of expected as on this data the field can change from object to text.

For this reason, the destination index has the index.ignore_malformed set to true so it would drop only the field with the conflicting mapping, not the entire document.

But while trying to do an reindex it stops when it gets an failure and I saw no option in the documentation to tell it to ignore, it has only the option to proceed on version conflicts, which is not the case.

Is there any way to force the reindex API to proceed on these cases?

Hi @leandrojmp

How do you have an existing index with 2 different mappings concrete and object or are you reindexing from more than 1 source? Perhaps I am not understanding....

Can you provide a couple examples?

You might need to write an ingest pipeline to handle the data issues...

Hello Stephen,

It is not an existing index with 2 different mappings, the source index has one mapping and the destination index has another mapping.

It is data from AWS Cloudtrail which can have gazillion fields that can be objects and concrete depending on the event, We don't map all the levels of these fields and use dynamic mapping for it so elastic will map according to the first ocurrence.

The issue is that I was using monthly indices, something like index-2023.03 and on the mid of the day I decided to change to daily indices, like index-2023.03.01, on the monthly indice this field (and probably others) was mapped as a concrete value, but in the daily index, it got mapped as an object.

This is expected and we use the ignore_malformed setting to not drop the entire documents when an mapping parsing error occurs, but it looks like that this setting is ignored by the reindex API and the reindex process halts on the first failure.

Hmmm interesting looks like ignore_malformed in the destination worked for me... perhaps the dynamic part?

DELETE my-index-000001

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer",
        "ignore_malformed": true
      },
      "number_two": {
        "type": "integer"
      }
    }
  }
}

POST my-index-000001/_doc/
{
  "text":       "Some text value",
  "number_one": "foo",
  "number_two": 3 
}

POST my-index-000001/_doc/
{
  "text":       "Some text value",
  "number_one": 1,
  "number_two": 3 
}

POST my-index-000001/_doc/
{
  "text":       "Some text value",
  "number_one": 2,
  "number_two": 3 
}

POST my-index-000001/_doc/
{
  "text":       "Some text value",
  "number_one": "bar",
  "number_two": 3 
}

GET my-index-000001/_search
{
  "fields": [
    "*"
  ]
}


PUT my-index-dest
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer",
        "ignore_malformed": true
      },
      "number_two": {
        "type": "integer"
      }
    }
  }
}

POST _reindex
{
  "source": {"index" : "my-index-000001"},
  "dest": {"index": "my-index-dest"}
}

# Results
{
  "took": 21,
  "timed_out": false,
  "total": 4,
  "updated": 0,
  "created": 4,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

GET my-index-dest/_search
{
  "fields": [
    "*"
  ]
}

# Results

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my-index-dest",
        "_id": "MiuzuIYBTddQNYvBZZqp",
        "_score": 1,
        "_ignored": [
          "number_one"
        ],
        "_source": {
          "text": "Some text value",
          "number_one": "foo",
          "number_two": 3
        },
        "fields": {
          "number_two": [
            3
          ],
          "text.keyword": [
            "Some text value"
          ],
          "text": [
            "Some text value"
          ]
        },
        "ignored_field_values": {
          "number_one": [
            "foo"
          ]
        }
      },
      {
        "_index": "my-index-dest",
        "_id": "MyuzuIYBTddQNYvBZZrT",
        "_score": 1,
        "_source": {
          "text": "Some text value",
          "number_one": 1,
          "number_two": 3
        },
        "fields": {
          "number_two": [
            3
          ],
          "number_one": [
            1
          ],
          "text.keyword": [
            "Some text value"
          ],
          "text": [
            "Some text value"
          ]
        }
      },
      {
        "_index": "my-index-dest",
        "_id": "NCuzuIYBTddQNYvBZZrh",
        "_score": 1,
        "_source": {
          "text": "Some text value",
          "number_one": 2,
          "number_two": 3
        },
        "fields": {
          "number_two": [
            3
          ],
          "number_one": [
            2
          ],
          "text.keyword": [
            "Some text value"
          ],
          "text": [
            "Some text value"
          ]
        }
      },
      {
        "_index": "my-index-dest",
        "_id": "NSuzuIYBTddQNYvBZZru",
        "_score": 1,
        "_ignored": [
          "number_one"
        ],
        "_source": {
          "text": "Some text value",
          "number_one": "bar",
          "number_two": 3
        },
        "fields": {
          "number_two": [
            3
          ],
          "text.keyword": [
            "Some text value"
          ],
          "text": [
            "Some text value"
          ]
        },
        "ignored_field_values": {
          "number_one": [
            "bar"
          ]
        }
      }
    ]
  }
}

Yeah, you are right, it works, just checked the documentation and it is pretty clear now:

You can’t use ignore_malformed with the following data types:

It just doesn't work for object fields, the entire document will be dropped.

Ugggh .... so between concrete and object issues, I have a ingest pipeline somewhere that checks the field to see if it is concrete and then writes the concrete as a sub-field in the object, problem is you would have to know them all apriori.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.