Dynamic mapping: Confusing type inference behaviour

Hi,

I'm using Elasticsearch 6.8.2.

I've been playing with mappings and I've stumbled across this behavior for which I can't find an explanation:

# Step 0: Creating an index with dynamic mappings enabled
PUT /test-datatypes
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_doc": {
          "properties" : {
              "shortfield" : { "type" : "short" }
          }
      }
    }
}

# Step 1: Add a valid short int
POST /test-datatypes/_doc/1
{
  "shortfield": 1
}

# Step 2: Add a float with many decimals
POST /test-datatypes/_doc/2
{
  "shortfield": 1.1234532145432145765432145867543211234256
}

# Step 3: Add the same float as a JSON string
POST /test-datatypes/_doc/3
{
  "shortfield": "1.1234532145432145765432145867543211234256"
}

# Step 4: Try to add a float with a big integer part 
POST /test-datatypes/_doc/4
{
  "shortfield": 112312.1234532145432145765432145867543211234256
}

# => As expected throws
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [shortfield] of type [short] in document with id '4'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [shortfield] of type [short] in document with id '4'",
    "caused_by": {
      "type": "json_parse_exception",
      "reason": "Numeric value (112312.1234532145432145765432145867543211234256) out of range of Java short\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@2a672a72; line: 2, column: 64]"
    }
  },
  "status": 400
}

If I search the index I get this:

GET test-datatypes/_search


{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-datatypes",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          # As expected
          "shortfield" : 1
        }
      },
      {
        "_index" : "test-datatypes",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          # Expexted: I would have expected this to throw an error at index time
          # My guess: decimals are truncated to fit in the number of bits of a short
          "shortfield" : 1.1234532145432146
        }
      },
      {
        "_index" : "test-datatypes",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          # Expected: I would have expected this to throw an error at index time
          # Suprising: Not truncated, "raw field"
          "shortfield" : "1.1234532145432145765432145867543211234256"
        }
      }
    ]
  }
}

If I look at the mappings I get:

GET test-datatypes/_search


{
  "test-datatypes" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "shortfield" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

The things I don't understand are:

  1. Why are docs 2 and 3 allowed to be indexed while their types do not match the short numerical type.
  2. Why doc 2's shortfield is returned truncated and not doc 3's.

I've been trying to find answers in the documentation without success.

Could someone please enlighten me as to why Elasticsearch behaves this way ?
Thank you

1 Like

Take a look at the coercing feature, which truncates fields to integers. See also https://www.elastic.co/guide/en/elasticsearch/reference/7.3/number.html for the default setting of coerce

As you can see in the mapping, by default a long is picked, and the floating point numbers get coerced into a long.

hope this helps.

1 Like

Thank you for your answer. I missed the coerce setting !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.