Duplicate field during reindex

Hello ! I am doing a reindex from a 1.3.4 cluster to a 7.1.1 cluster but I have an error during the reindex :

curl -X POST "$ES/_reindex?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{                
  "source": {
    "remote": {
      "host": "http://42.42.42.42:9200"
    },
    "index": "xxxxxx-2017-09",
    "size": 1000
  },
  "dest": {
    "index": "xxxxxx-2017-09"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._id=null"
  }
}
'

{
  "error": {
    "root_cause": [
      {
        "type": "exception",
        "reason": "Error parsing the response, remote is likely not an Elasticsearch instance"
      }
    ],
    "type": "exception",
    "reason": "Error parsing the response, remote is likely not an Elasticsearch instance",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:157528] [search_response] failed to parse field [hits]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:157528] [hits] failed to parse field [hits]",
        "caused_by": {
          "type": "x_content_parse_exception",
          "reason": "[1:157528] [hit] failed to parse field [_source]",
          "caused_by": {
            "type": "parsing_exception",
            "reason": "[hit] failed to parse [_source]",
            "line": 1,
            "col": 157528,
            "caused_by": {
              "type": "json_parse_exception",
              "reason": "Duplicate field 'text'\n at [Source: org.apache.http.nio.entity.ContentInputStream@3f28eba4; line: 1, column: 157700]",
              "suppressed": [
                {
                  "type": "illegal_state_exception",
                  "reason": "Failed to close the XContentBuilder",
                  "caused_by": {
                    "type": "i_o_exception",
                    "reason": "Unclosed object or array found"
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "status": 500
}

Apparently the problem comes from a duplicated field.

Do you know how can I get those documents with duplicated field and maybe how to delete them please ?

Thank you and have a good day !

PS: I did the same reindex operation for other indices on the same cluster and everything goes fine :frowning:

Hi, I search little bit. Maybe manually define _source.include ?

1 Like

Hi, thank you for your quick response !

I did this request (with the include) :

curl -X POST "$ES/_reindex?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{                
  "source": {
    "remote": {
      "host": "http://42.42.42.42:9200"
    },
    "index": "xxxxxxxxxxxx-2017-09",
    "size": 1000,
    "_source": {
      "include": [
        "xxxxxxxxxxxx",
        "text",
        "xxxxxxxxxxxx"
      ]
    }
  },
  "dest": {
    "index": "xxxxxxxxxxxx-2017-09"
  },
  "script": {
    "source": "ctx._id=null;"
  }
}
'

And no error !

But I can't query the content of the index anymore.

The index is not empty :

health status index                   docs.count store.size
green  open   xxxxxxxxxxxx-2017-09    2424000      4.5gb

But returns zero doc :

curl -X GET -H 'Content-Type: application/json' "$ES/xxxxxxxxxxxx-2017-09/_search"

{"took":4,"timed_out":false,"_shards":{"total":12,"successful":12,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

Do you know why please ?

Nvm, it works. I tried again and after a while the data is here. Don't know how.

Thank you very much !