Reindex from Remote ParsingExcpetion

Hi!

We are currently migrating from our Elasticsearch 2.4 to 5.2.2 (or even 5.3.0).
We decided to reindex the data instead of using snapshot/restore.

For reindexing I use the Reindex API which works quite well what I have seen so far.
But for one index (so far) the reindex aborts with the following error:

{
	"error": {
		"root_cause": [{
			"type": "parsing_exception",
			"reason": "[_shards] failed to parse field [failures]",
			"line": 1,
			"col": 347
		}],
		"type": "parsing_exception",
		"reason": "[search_response] failed to parse field [_shards]",
		"line": 1,
		"col": 347,
		"caused_by": {
			"type": "parsing_exception",
			"reason": "[_shards] failed to parse field [failures]",
			"line": 1,
			"col": 347,
			"caused_by": {
				"type": "illegal_argument_exception",
				"reason": "[failure] index doesn't support values of type: VALUE_NULL"
			}
		}
	},
	"status": 400
}

Cluster log:

The process doesn't return this error immediately.
It reindexes around 40G out of 80G and then it stops with this error.

On the one hand it seems that the response from the remote server is not valid JSON.
I saw a change in the code related to that improving the error message in 5.3.0 (https://github.com/elastic/elasticsearch/pull/22536).

But I am not sure if it just the API or if there is something wrong with a document.
There was already the case that reindex aborted because a document was not valid JSON in the 2.4 cluster.
I changed the log level to debug, but I don't know if I can see at which document it is failing.

Thanks in advance for checking the issue!

1 Like

Please don't post pictures of text, they are difficult to read and some people may not be even able to see them :slight_smile:

What does the reindex request look like?

Sorry for the image but it was the easiest way to post the stacktrace (character limit in the post, file type limit to images).

Here is the request:

curl -XPOST 'localhost:9200/_reindex?wait_for_completion=true' -d '
{
    "conflicts": "proceed",
    "source": {
        "remote": {
            "host": "http://remote.host.com:9200"
        },
        "index": "index-2017-1"
    },
    "dest": {
        "index": "index-2017-1"
    }
}'

As you can see it is a time-based index per day.

NP!
You can use gist/pastebin/etc for that sort of thing too :slight_smile:

Thank you for the idea!

Elasticsearch logs:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.