Eland-imported naver/splade-v3 text_expansion produces much smaller sparse vectors and worse ranking than local SentenceTransformers SparseEncoder
Environment
- Elasticsearch version:
9.4.2 - Eland Docker image used:
docker.elastic.co/eland/eland:9.2.0 - Hugging Face model:
naver/splade-v3 - Elasticsearch model id:
naver-splade-v3-m512 - Task type:
text_expansion - Max model input length:
512 - Quantization: not enabled
- Prefix strings: none
- Test corpus: full
DnD5eSRD.pdf, parsed into 875 chunks
Eland Import Command
docker run --rm -it "$ELAND_IMAGE" \
eland_import_hub_model \
--url "$ES_URL" \
--es-username "$ES_USER" \
--es-password "$ES_PASSWORD" \
--hub-model-id naver/splade-v3 \
--hub-access-token "$HF_TOKEN" \
--task-type text_expansion \
--es-model-id naver-splade-v3-m512 \
--max-model-input-length 512 \
--start
Elasticsearch Model Config Observed
{
"model_id": "naver-splade-v3-m512",
"model_type": "pytorch",
"inference_config": {
"text_expansion": {
"vocabulary": {
"index": ".ml-inference-native-000002"
},
"tokenization": {
"bert": {
"do_lower_case": true,
"with_special_tokens": true,
"max_sequence_length": 512,
"truncate": "first",
"span": -1
}
}
}
},
"deployment_ids": ["naver-splade-v3-m512"]
}
Mapping
We explicitly disabled sparse-vector index pruning and source-vector exclusion:
{
"settings": {
"number_of_shards": 1,
"index.mapping.exclude_source_vectors": false
},
"mappings": {
"properties": {
"doc_id": { "type": "keyword" },
"chunk_id": { "type": "integer" },
"title": { "type": "text", "analyzer": "spanish" },
"source_file": { "type": "keyword" },
"text_sha1": { "type": "keyword" },
"text": { "type": "text", "analyzer": "spanish" },
"splade_tokens": {
"type": "sparse_vector",
"index_options": {
"prune": false
}
}
}
}
}
Ingest Pipeline
{
"processors": [
{
"inference": {
"model_id": "naver-splade-v3-m512",
"input_output": [
{
"input_field": "text",
"output_field": "splade_tokens"
}
]
}
}
]
}
Query Methods Tested
1. Elasticsearch online sparse query
Elasticsearch encodes the query using the deployed model:
{
"query": {
"sparse_vector": {
"field": "splade_tokens",
"inference_id": "naver-splade-v3-m512",
"query": "What are the specific methods for destroying each of the seven layers of a Prismatic Wall?",
"prune": false
}
}
}
2. Elasticsearch sparse query with local precomputed query vector
Python computes the query vector with:
from sentence_transformers import SparseEncoder
model = SparseEncoder("naver/splade-v3")
query_embedding = model.encode_query([query], convert_to_tensor=True, convert_to_sparse_tensor=False)[0]
Then sends:
{
"query": {
"sparse_vector": {
"field": "splade_tokens",
"query_vector": {
"...": "local SPLADE token weights"
},
"prune": false
}
}
}
3. Local SentenceTransformers SPLADE ranking
Python computes document and query vectors locally:
model = SparseEncoder("naver/splade-v3")
doc_embeddings = model.encode_document(docs, convert_to_tensor=True, convert_to_sparse_tensor=False)
query_embedding = model.encode_query([query], convert_to_tensor=True, convert_to_sparse_tensor=False)[0]
score = sparse_dot(query_vector, doc_vector)
Main Result
Question:
What are the specific methods for destroying each of the seven layers of a Prismatic Wall?
Correct answer is in chunks:
dnd5esrd-0378dnd5esrd-0379
The answer text includes:
The wall ... can be destroyed one layer at a time, in order from red to violet...
Red: at least 25 Cold damage
Orange: strong wind, such as Gust of Wind
Yellow: at least 60 Force damage
Green: Passwall or equal/higher spell opening a portal on a solid surface
Blue: at least 25 Fire damage
Indigo: Bright Light from Daylight
Violet: Dispel Magic
Ranking comparison:
Local SentenceTransformers SPLADE:
dnd5esrd-0378 ranked #1, score 20.1291
dnd5esrd-0379 ranked #10, score 9.0233
Elasticsearch sparse online:
missed answer in top 10
Elasticsearch sparse with local precomputed query vector:
missed answer in top 10
BM25 multi_match:
dnd5esrd-0378 ranked #2
This suggests the issue is not only ES query inference. Even when the query vector is computed locally, ES sparse search misses the answer because the indexed document vectors generated by ES inference differ substantially from the local SentenceTransformers document vectors.
Vector Size / Similarity Evidence
Full-book run over 875 chunks:
Local doc dims mean: 537.848
ES stored vector dims mean: 59.258
Mean cosine between local doc vectors and ES stored doc vectors: 0.3377
Mean top-50 token overlap: 0.1064
Direct _infer comparison:
Query vector:
local dims: 42
ES infer dims: 14
cosine: 0.4688
First chunk vector:
local dims: 241
ES infer dims: 20
cosine: 0.3191
Earlier smaller DnD smoke test, 40 chunks:
Local doc dims mean: 623.65
ES stored dims mean: 275.5
Vector cosine mean: 0.753
Top-10 overlap:
local vs ES online: 7/10
local vs ES precomputed: 7/10
ES online vs ES precomputed: 10/10
The full-book test exposes a much larger mismatch.
Questions For Elastic
-
Is
naver/splade-v3imported through Eland astext_expansionexpected to produce vectors equivalent tosentence_transformers.SparseEncoder("naver/splade-v3").encode_document()and.encode_query()? -
Does Elasticsearch
text_expansioninference apply any internal output pruning, thresholding, normalization, top-k filtering, or token filtering even when:- sparse-vector mapping uses
"index_options": {"prune": false} - search query uses
"prune": false "index.mapping.exclude_source_vectors": false
- sparse-vector mapping uses
-
Is Eland correctly handling the SentenceTransformers
SparseEncoderpooling/activation logic fornaver/splade-v3, or is it only exporting the underlying masked-language-model head without the same SPLADE post-processing used by SentenceTransformers? -
Is there a recommended Eland import configuration for
naver/splade-v3to preserve full SPLADE token-weight parity? -
Should this model be imported differently, for example with a local exported model folder, different
--max-model-input-length, a different task type, a custom inference config, or a supported Elastic-native SPLADE/text-expansion model? -
Why does Elasticsearch sparse search with a locally precomputed query vector still miss the answer that local SPLADE ranks #1? Is the indexed document-vector representation expected to differ this much?
-
What is the recommended production path if we need SPLADE-v3 parity with local SentenceTransformers ranking: Eland import, custom ingest of externally computed sparse vectors, or another Elastic-supported sparse model?