Elasticsearch - ingest pipeline - convert string field to integer

Dear Community Members,

  • I have the following document in an index called raw_marvel_movies
$ cat 1doc.json |jq '._source'
{
  "year": "2015",
  "id": "4",
  "name": "Avengers: Age of Ultron",
  "Final box office ranks (billion)": 1.405,
  "Opening weekend box office (million)": 191.27
}
$ cat 1doc.json |jq '._source.id'
"4"
  • want to convert the _source.id from string to integer
  • in order to do so, I used an elasticsearch ingest pipeline and the _reindex API like so:
PUT _ingest/pipeline/convert_str2int_ingest_pipeline
{
  "description": "convert string to integer pipeline with handled exceptions",
  "version": 1,
  "processors": [
    {
      "convert": {
        "field": "_source.id",
        "type": "integer"
      }
    }
  ]
}
POST _reindex
{
  "source": {
    "index": "raw_marvel_movies"
  },
  "dest": {
    "index": "marvel_movies",
    "pipeline": "convert_str2int_ingest_pipeline"
  }
}
  • why should I received the follow error response ? :confused: unable to convert [id] to integer
{
  "took": 172,
  "timed_out": false,
  "total": 48,
  "updated": 0,
  "created": 47,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "marvel_movies3",
      "type": "doc",
      "id": "GN3JV2YBQcY2JSA-GX-Q",
      "cause": {
        "type": "exception",
        "reason": "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: unable to convert [id] to integer",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "java.lang.IllegalArgumentException: unable to convert [id] to integer",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "unable to convert [id] to integer",
            "caused_by": {
              "type": "number_format_exception",
              "reason": "For input string: \"id\""
            }
          }
        },
        "header": {
          "processor_type": "convert"
        }
      },
      "status": 500
    }
  ]
}

I obviously doing something wrong here, it's probably the lack of caffeine, but I can not see what's wrong here :confused: because using the _simulateapi, I don't get any errors :

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "convert": {
          "field": "_source.id",
          "type": "integer"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "marvel_movies",
      "_type": "doc",
      "_id": "Gd3JV2YBQcY2JSA-GX-Q",
      "_score": 1,
      "_source": {
        "year": "2015",
        "id": "4",
        "name": "Avengers: Age of Ultron",
        "final_box_office_ranks_billion": 1.405,
        "opening_weekend_box_office_million": 191.27
      }
    }
  ]
}

answer:

{
  "docs": [
    {
      "doc": {
        "_index": "marvel_movies",
        "_type": "doc",
        "_id": "Gd3JV2YBQcY2JSA-GX-Q",
        "_source": {
          "year": "2015",
          "final_box_office_ranks_billion": 1.405,
          "name": "Avengers: Age of Ultron",
          "id": 4,
          "opening_weekend_box_office_million": 191.27
        },
        "_ingest": {
          "timestamp": "2018-10-09T10:08:50.372Z"
        }
      }
    }
  ]
}

Here the id field is well an integer

Any advices? Thanks in advance.

kr,

If you get that document in error, what is the value for id ?

GET raw_marvel_movies/doc/GN3JV2YBQcY2JSA-GX-Q

Based on the error it looks like the document may have

"id": "id"

instead of an integer as string for the value .

Damned! You're right, stupid mistake I made, the dataset comes from a csv file and I indexed the header of the file :frowning:

Thanks for your answer and your time.

This topic can be closed.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.