Wrong dynamic mapping in Elasticsearch 8.11 prevents indexation of arrays of more than 127 strings

Since 8.11.0, when using dynamic mapping, there is a defect preventing the indexation of documents with an array field containing more than 127 strings.

Here is how to reproduce:

1- start Elasticsearch 8.11.0:

docker run -p 9201:9200 -it -m 1GB -e xpack.security.enabled=false -e discovery.type=single-node  docker.elastic.co/elasticsearch/elasticsearch:8.11.0

2- create an index

PUT /testindex
body: {}

3- try to insert this doc:

PUT /testindex/_doc/foo
body:
{
  "list": [
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo"]
}

Elasticsearch fails as follows

{
  "error":{
    "root_cause":[
      {
        "type":"parsing_exception",
        "reason":"Failed to parse object: expecting token of type [VALUE_NUMBER] but found [VALUE_STRING]",
        "line":1,
        "col":10
      }
    ],
    "type":"document_parsing_exception",
    "reason":"[1:10] failed to parse: Failed to parse object: expecting token of type [VALUE_NUMBER] but found [VALUE_STRING]",
    "caused_by":{
      "type":"parsing_exception",
      "reason":"Failed to parse object: expecting token of type [VALUE_NUMBER] but found [VALUE_STRING]",
      "line":1,
      "col":10
    }
  },
  "status":400
}

And in the mappings, you see this new mapping created:

"mappings": {
  "_doc": {
    "properties": {
      "list": {
        "dims": 128,
        "similarity": "cosine",
        "index": true,
        "type": "dense_vector"
      }
    }
  }
},

Instead we should have the same mapping as in previous versions:

"mappings": {
  "_doc": {
    "properties": {
      "list": {
        "type": "text",
        "fields": {
          "keyword": {
            "ignore_above": 256,
            "type": "keyword"
         }
       }
    }
  }
},

If the array contains less than 128 items, the indexation works fine.

Is it a defect or a new undocumented limitation?

2 Likes

Yep, looks like a bug! Thank you @JulienCarnec for finding it!

2 Likes

Thanks for reporting this.
I can indeed reproduce this behavior. It's a bug to me but let me ask the team :wink:

Merci Julien :slight_smile:

1 Like

Ha! @BenTrent is already on it :wink:

Here is the full script BTW:

DELETE /testindex
PUT /testindex
PUT /testindex/_doc/foo
{
  "list": [
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo",
    "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo"]
}
GET /testindex/_mapping

Here is a github issue tracking the bug: Array of strings incorrectly indexed as `dense_vector` · Issue #101965 · elastic/elasticsearch · GitHub

Getting it addressed ASAP!

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.