Elastic stores string in numeric (long) field

Hello,

I noticed that elastic allows to store a string value in a numeric field that was mapped as long (as the string can be parsed as a number).

I'm not sure if this is a bug or a feature (and if it has a setting to control it).

The case I'm describing can be reproduced with this snippet

PUT my-index
{
  "mappings": {
    "test": {
      "dynamic": "strict",
      "properties": {
        "testId": {
          "type": "long"
        }
      }
    }
  }
}

PUT my-index/test/1
{
  "testId": 1
}

PUT my-index-2/test/2
{
  "testId": "2"
}

In the search result, one will be string and the other long

        "_source" : {
          "testId" : 1
        }
         "_source" : {
          "testId" : "2"
        }

Hi Yaniv,

Note in your example I think there is a typo as the second PUT is on a different index my-index-2, so most likely my-index-2 has created a text+keyword field if you did not also create it as type long

That being said, when you run a search "_source" will return the exact json payload which you sent. Value 2 or "2" are acceptable as the field will store a number in both cases since the field is already declared a long.

If you try this though, the document will be rejected:

PUT my-index/test/3
{
  "testId": "three"
}
# returns an error like failed to parse field [testId] of type [long] in document with id '3'. Preview of field's value: 'three'"

And you can run a max aggregation or check the mapping to see the datatype cannot change on an existing field :

#returns long
GET my-index/_mapping/test/field/testId

#returns 2.0
GET my-index/_search?size=0
{
  "aggs": {
    "max_testid": {
      "max": {
        "field": "testId"
      }
    }
  }
}

Hope this makes sense

1 Like

Thanks for your quick reply Julien. my-index-2 was just a typo in the copy and pasting, and it was all done on the same index.

I forgot to mention that I'm using elastic 6.8

I'm pasting here again all the steps

PUT my-index
{
  "mappings": {
    "test": {
      "dynamic": "strict",
      "properties": {
        "testId": {
          "type": "long"
        }
      }
    }
  }
}

PUT my-index/test/1
{
  "testId": 1
}

PUT my-index/test/2
{
  "testId": "2"
}

GET my-index/_search

The response is:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index",
        "_type" : "test",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "testId" : "2"
        }
      },
      {
        "_index" : "my-index",
        "_type" : "test",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "testId" : 1
        }
      }
    ]
  }
}

Yes v6.x is clear as declaring mapping with document type is deprecated in v6, and yes I did expect a typo. Let me know if you need clarification, the previous answer should cover that "2" is just coming from the json payload you sent, and value is a long not a string since the field has long datatype (you cannot change the datatype on an existing field - without recreating the index and reindexing)
Thanks

I'm not sure you got me right (or I didn't get you right). The typo I mentioned was only in the example I posted. I then fixed it and shared in my second comment the correct reproduction steps and the responses I get from elastic.

Elastic accepts both values 1 and "2" on the same testId field under the same index my-index (which is mapped as long) and stores the first one as long and the second one as string.
As you mentioned in your first response, it won't accept a value like "three", which proves that it does makes a validation check, but at the end of the day, it stores the "2" as a string "2".
I would expect to either get an exception thrown of that it would store it as 2 (long value, without the quotes).

I know 6.8 is going end of life soon. I haven't tested that in 7.x yet to see if the same.

Hi just my 2 cents... got me thinking a bit

When a normal search is performed in elasticsearch it returns the source document as the _source, what the actual source document that was indexed .... the actual values are stored in the elsewhere (example) doc_values so as you can see below when I do a search that returns both. In this case when you send in a "2" elasticsearch can properly convert that to a number upon indexing but it does not alter the source, that is all that is happening if it is desired to for the source to be correct ... then the correct source needs to be indexed, elasticsearch does not alter the source document, it is not a compiler like C++, if it can index it will, it has some "liberalness" to it to ease compatibility with JSON etc. and ease ingestion, I think that is perhaps considered a feature not a bug.

However when you look at the actual doc_values you can see that in fact both are actually numeric fields.

DELETE my-index

PUT my-index

PUT my-index
{
  "mappings": {
    "properties": {
      "testId": {
        "type": "long"
      }
    }
  }
}

PUT my-index/_doc/1
{
  "testId": 1
}

PUT my-index/_doc/2
{
  "testId": "2"
}

PUT my-index/_doc/3
{
  "testId": "three"
}

GET my-index/_search
{
  "docvalue_fields": ["testId"], 
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "testId": {
              "gt": 0
            }
          }
        }
      ]
    }
  }
}


# Result
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "testId" : 1
        },
        "fields" : {
          "testId" : [
            1
          ]
        }
      },
      {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "testId" : "2"
        },
        "fields" : {
          "testId" : [
            2
          ]
        }
      }
    ]
  }
}

Thanks @stephenb.
We got to notice this behavior when we accidentally started indexing the numbers as strings, and as the range in your example, we were using aggs (that came back as a number) and failed to equal the values from the _source (which were coming back as strings).
IMO this behavior is tricky and confusing. But I agree with you that it gives some sort of tolerance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.