Cannot reindex due to bad id field(s)

I ran into an issue where we are trying to reindex in order to gain a copy_to search field.

"type": "action_request_validation_exception",
    "reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 877;"

During the last reindex, ids were written that were longer than 512 bytes.

I was hoping that we could target these particular bad _ids with this script:

{
    "script": {
"source": "if (ctx._id.length() > 512) {ctx._id = 'null'}",
"lang": "painless"
},
  "source": {
    "index": "testing"
  },
  "dest": {
    "index": "testing_v5"
  }
}

But I still get the same error as above.

Also I noticed then when I do a large reindex, with the ?wait_for_completion=false even after the task says its complete, it seems to take a long time before docs are available.

Here are my latest issues:

Our index has a number of docs with bad ids. They were set by this script during a previous reindex:

def lowerCaseAuthor = ctx.author_string != null ? ctx.author_string.toLowerCase() : null;
def lowerCasePub = ctx.publication_fqdn != null ? ctx.publication_fqdn.toLowerCase() : null;
ctx._id =  lowerCaseAuthor + lowerCasePub + ctx.collection_id

The problem is that some our author_string fields are very long. More than 256 characters long making the id fields longer than the 256 character limit.

Now that I am trying to reindex again, I am getting frequent errors trying to reindex.

"error": { - 
    "type": "action_request_validation_exception",
    "reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 529;"
  }

I had thought that maybe I could filter out the bad docs in the reindex. I came up with the query which seems to work fine as a standalone query:

{
    "query": {
    "bool" : {
        "filter": {
            "script": {
                "script":"doc['_id'].value.length() > 256"
            }
         }
        }
    }
}

To count the number of docs with ids that are too long. But when I apply this query script to my reindex:

{
  "source": {
    "index": "testing",
    "query": {
      "bool": {
        "filter": {
          "script": {
            "script": "doc['_id'].value.length() < 256"
          }
        }
      }
    }
  },
  "dest": {
    "index": "testing_v6"
  }
}

I am back to square one. It runs a bit and then I get the same validation error as above:

"error": { - 
    "type": "action_request_validation_exception",
    "reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 529;"
  }

The solution is that the id field is limited to 128 CHARACTERS.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.