Client request timeout

I have an index with wrong datatype in an attribute "vjid"

I tried to convert "vjid" values from string to integer using this command:

POST vjdb/_update_by_query
{
  "script": {
    "source": "ctx._source.vjid = Integer.parseInt(ctx._source.vjid)",
    "lang": "painless"
  },
  "query": {
    "exists": {
      "field": "vjid"
    }
  }
}

it works perfectly but suddenly stops after tens of seconds with this message:

{
  "statusCode": 502,
  "error": "Bad Gateway",
  "message": "Client request timeout"
}

this is the input content of the index:

"hits": [
      {
        "_index": "vjdb",
        "_id": "uGhUGI8ByGfsded-LttS",
        "_score": 1,
        "_source": {
          "vjid": "12575001"
        }
      },
      {
        "_index": "vjdb",
        "_id": "uWhUGI8ByGfsded-LttS",
        "_score": 1,
        "_source": {
          "vjid": "12575271"
        }
      },...

and this is the correct output of the script:

"hits": [
      {
        "_index": "vjdb",
        "_id": "uGhUGI8ByGfsded-LttS",
        "_score": 1,
        "_source": {
          "vjid": 12575001
        }
      },
      {
        "_index": "vjdb",
        "_id": "uWhUGI8ByGfsded-LttS",
        "_score": 1,
        "_source": {
          "vjid": 12575271
        }
      },...

the output shows it is working nicely for the first 10K but at some point it terminated and the rest of the data is not converted.
I tried to slice the input or increase the timeout but all failed. Do you know any successful method to convert all 20 million records with this code?

From Kibana to Elasticsearch

What is the mapping for this field? I guess it's a text right?
In which case, what you are doing is not going to change anything but the json _source content.

Yes, it is string in the mapping. But I need to convert it to integer.
the code that I provided is able to fix it but it times out after a while of the execution and I cannot apply it on the entire indexed data.

If you need to convert it to integer because you want to filter by values or compute aggregations, what you are doing is not going to work.
If the field is still a text, and not a int in the mapping.

So you need to create a new index and reindex your data. That's the best option IMO.

Look at the reindex API. And I would use an ingest pipeline with a convert processor to change your field from string to number on the fly, while reindexing.

1 Like