Dynamic template mapping overridden by automatic `dense_vector` inference for float arrays

I encountered a situation where a dynamic template specifying a field as float is overridden by automatic dense_vector mapping when indexing a long float array in Elasticsearch.

My intention is to store embeddings as a plain float array (not as a vector field), but Elasticsearch automatically converts the field to dense_vector, ignoring the dynamic template.

According to the document, unmapped long array would become vector, but here I already applied the dynamic_template.

Unmapped array fields of float elements with size between 128 and 4096 are dynamically mapped as dense_vector

Example

On Elasticsearch 9.2.4

Create an index with a dynamic template:

PUT test_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "embedding_as_float": {
          "path_match": "*.embedding",
          "mapping": {
            "type": "float",
            "index": false
          }
        }
      }
    ]
  }
}

Index a document:

POST test_index/_doc
{
  "text_embedding": {
    "embedding": [
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,
      0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0
    ]
  }
}

Mapping result:

The text_embedding.embedding field has become dense_vector despite the dynamic mapping.

GET test_index/_mapping
{
  "test_index": {
    "mappings": {
      "dynamic_templates": [
        {
          "embedding_as_float": {
            "path_match": "*.embedding",
            "mapping": {
              "index": false,
              "type": "float"
            }
          }
        }
      ],
      "properties": {
        "text_embedding": {
          "properties": {
            "embedding": {
              "type": "dense_vector",
              "dims": 150,
              "index": true,
              "similarity": "cosine",
              "index_options": {
                "type": "int8_hnsw",
                "m": 16,
                "ef_construction": 100
              }
            }
          }
        }
      }
    }
  }
}

Additional observations

  • The dynamic template appears to work if the array is very short, but once the float array becomes longer it is automatically mapped as dense_vector.

  • If I change the dynamic template type to integer, the template works and the field is not converted to a vector. So the path_match is working as intended

  • When index: false is specified in the template, the automatically created dense_vector mapping ignores this and sets index: true.

Why I want this behavior

I store embeddings for two different purposes:

  • One field is used for kNN search (so dense_vector is appropriate).

  • Another field is only for storing the raw embedding for later retrieval.

Because of the vector changes in newer versions of Elasticsearch, vector fields are excluded from _source by default unless "exclude_vectors": false is specified in queries. I would prefer to store this field simply as a float array (index: false) rather than a dense_vector.

Question

Is this behavior expected (automatic vector inference overriding dynamic templates), or is there a recommended way to disable vector auto-detection so that the dynamic template mapping is respected?

I found potential root cause and raised issue in github.

Hi @giga811 !

Thanks for opening the bug. I’ve checked the bug reproduced, thanks for taking the trouble on providing a minimal bug reproduction.

While we prioritize work on it, I Just wanted to check if you have a viable workaround.

If you want to keep the original value that was indexed as-is, you probably want _source (as it already does that for you). If you really want to keep the original raw field, you can use "index.mapping.exclude_source_vectors": true in your index mapping, so you don’t need to specify that on the queries.

If you want to get the indexed value (not exactly what you introduced, but close enough), remember that you can use fields for retrieving the value even if you don’t store the _source for it.

Hope that helps!

Thank you the response.

Yes, we are currently using the "index.mapping.exclude_source_vectors": true option. One of our motivation was to stop using this option so that es index is optimized for the new way of storing and handling the vector. As we have actual vector fields for the vector search too.

I am thinking of alternative to dynamically mapping the field to integer, as we don’t really use the field for search so different type is fine for us. It is more important that it is marked as index: false for the field. I saw that even with integer type, the source retrieval returns the raw float numbers, so I think it will suffice our usecase.