Eland-imported naver/splade-v3 text_expansion produces much smaller sparse vectors and worse ranking than local SentenceTransformers SparseEncoder

Eland-imported naver/splade-v3 text_expansion produces much smaller sparse vectors and worse ranking than local SentenceTransformers SparseEncoder

Environment

  • Elasticsearch version: 9.4.2
  • Eland Docker image used: docker.elastic.co/eland/eland:9.2.0
  • Hugging Face model: naver/splade-v3
  • Elasticsearch model id: naver-splade-v3-m512
  • Task type: text_expansion
  • Max model input length: 512
  • Quantization: not enabled
  • Prefix strings: none
  • Test corpus: full DnD5eSRD.pdf, parsed into 875 chunks

Eland Import Command

docker run --rm -it "$ELAND_IMAGE" \
  eland_import_hub_model \
    --url "$ES_URL" \
    --es-username "$ES_USER" \
    --es-password "$ES_PASSWORD" \
    --hub-model-id naver/splade-v3 \
    --hub-access-token "$HF_TOKEN" \
    --task-type text_expansion \
    --es-model-id naver-splade-v3-m512 \
    --max-model-input-length 512 \
    --start

Elasticsearch Model Config Observed

{
  "model_id": "naver-splade-v3-m512",
  "model_type": "pytorch",
  "inference_config": {
    "text_expansion": {
      "vocabulary": {
        "index": ".ml-inference-native-000002"
      },
      "tokenization": {
        "bert": {
          "do_lower_case": true,
          "with_special_tokens": true,
          "max_sequence_length": 512,
          "truncate": "first",
          "span": -1
        }
      }
    }
  },
  "deployment_ids": ["naver-splade-v3-m512"]
}

Mapping

We explicitly disabled sparse-vector index pruning and source-vector exclusion:

{
  "settings": {
    "number_of_shards": 1,
    "index.mapping.exclude_source_vectors": false
  },
  "mappings": {
    "properties": {
      "doc_id": { "type": "keyword" },
      "chunk_id": { "type": "integer" },
      "title": { "type": "text", "analyzer": "spanish" },
      "source_file": { "type": "keyword" },
      "text_sha1": { "type": "keyword" },
      "text": { "type": "text", "analyzer": "spanish" },
      "splade_tokens": {
        "type": "sparse_vector",
        "index_options": {
          "prune": false
        }
      }
    }
  }
}

Ingest Pipeline

{
  "processors": [
    {
      "inference": {
        "model_id": "naver-splade-v3-m512",
        "input_output": [
          {
            "input_field": "text",
            "output_field": "splade_tokens"
          }
        ]
      }
    }
  ]
}

Query Methods Tested

1. Elasticsearch online sparse query

Elasticsearch encodes the query using the deployed model:

{
  "query": {
    "sparse_vector": {
      "field": "splade_tokens",
      "inference_id": "naver-splade-v3-m512",
      "query": "What are the specific methods for destroying each of the seven layers of a Prismatic Wall?",
      "prune": false
    }
  }
}

2. Elasticsearch sparse query with local precomputed query vector

Python computes the query vector with:

from sentence_transformers import SparseEncoder

model = SparseEncoder("naver/splade-v3")
query_embedding = model.encode_query([query], convert_to_tensor=True, convert_to_sparse_tensor=False)[0]

Then sends:

{
  "query": {
    "sparse_vector": {
      "field": "splade_tokens",
      "query_vector": {
        "...": "local SPLADE token weights"
      },
      "prune": false
    }
  }
}

3. Local SentenceTransformers SPLADE ranking

Python computes document and query vectors locally:

model = SparseEncoder("naver/splade-v3")
doc_embeddings = model.encode_document(docs, convert_to_tensor=True, convert_to_sparse_tensor=False)
query_embedding = model.encode_query([query], convert_to_tensor=True, convert_to_sparse_tensor=False)[0]
score = sparse_dot(query_vector, doc_vector)

Main Result

Question:

What are the specific methods for destroying each of the seven layers of a Prismatic Wall?

Correct answer is in chunks:

  • dnd5esrd-0378
  • dnd5esrd-0379

The answer text includes:

The wall ... can be destroyed one layer at a time, in order from red to violet...
Red: at least 25 Cold damage
Orange: strong wind, such as Gust of Wind
Yellow: at least 60 Force damage
Green: Passwall or equal/higher spell opening a portal on a solid surface
Blue: at least 25 Fire damage
Indigo: Bright Light from Daylight
Violet: Dispel Magic

Ranking comparison:

Local SentenceTransformers SPLADE:
  dnd5esrd-0378 ranked #1, score 20.1291
  dnd5esrd-0379 ranked #10, score 9.0233

Elasticsearch sparse online:
  missed answer in top 10

Elasticsearch sparse with local precomputed query vector:
  missed answer in top 10

BM25 multi_match:
  dnd5esrd-0378 ranked #2

This suggests the issue is not only ES query inference. Even when the query vector is computed locally, ES sparse search misses the answer because the indexed document vectors generated by ES inference differ substantially from the local SentenceTransformers document vectors.

Vector Size / Similarity Evidence

Full-book run over 875 chunks:

Local doc dims mean: 537.848
ES stored vector dims mean: 59.258
Mean cosine between local doc vectors and ES stored doc vectors: 0.3377
Mean top-50 token overlap: 0.1064

Direct _infer comparison:

Query vector:
  local dims: 42
  ES infer dims: 14
  cosine: 0.4688

First chunk vector:
  local dims: 241
  ES infer dims: 20
  cosine: 0.3191

Earlier smaller DnD smoke test, 40 chunks:

Local doc dims mean: 623.65
ES stored dims mean: 275.5
Vector cosine mean: 0.753
Top-10 overlap:
  local vs ES online: 7/10
  local vs ES precomputed: 7/10
  ES online vs ES precomputed: 10/10

The full-book test exposes a much larger mismatch.

Questions For Elastic

  1. Is naver/splade-v3 imported through Eland as text_expansion expected to produce vectors equivalent to sentence_transformers.SparseEncoder("naver/splade-v3").encode_document() and .encode_query()?

  2. Does Elasticsearch text_expansion inference apply any internal output pruning, thresholding, normalization, top-k filtering, or token filtering even when:

    • sparse-vector mapping uses "index_options": {"prune": false}
    • search query uses "prune": false
    • "index.mapping.exclude_source_vectors": false
  3. Is Eland correctly handling the SentenceTransformers SparseEncoder pooling/activation logic for naver/splade-v3, or is it only exporting the underlying masked-language-model head without the same SPLADE post-processing used by SentenceTransformers?

  4. Is there a recommended Eland import configuration for naver/splade-v3 to preserve full SPLADE token-weight parity?

  5. Should this model be imported differently, for example with a local exported model folder, different --max-model-input-length, a different task type, a custom inference config, or a supported Elastic-native SPLADE/text-expansion model?

  6. Why does Elasticsearch sparse search with a locally precomputed query vector still miss the answer that local SPLADE ranks #1? Is the indexed document-vector representation expected to differ this much?

  7. What is the recommended production path if we need SPLADE-v3 parity with local SentenceTransformers ranking: Eland import, custom ingest of externally computed sparse vectors, or another Elastic-supported sparse model?