Hi everyone,
I created an Elasticsearch index with a dense_vector
field, along with some text fields. The mapping looks like this (simplified):
{"mappings": {"properties": {"embedding": {"type": "dense_vector","dims": 3,"index": true,"index_options": { "type": "int8_hnsw" }},"title": { "type": "text" },"text": { "type": "text" }}}}
When I read this index in Spark using the Elasticsearch for Apache Hadoop connector:
df = spark.read.format("es").option("es.nodes", "172.22.10.20").option("es.port", "9200").option("es.nodes.wan.only", "true").load("bbb")
df.printSchema()
df = spark.read.format("es").option("es.nodes", "172.22.10.20").option("es.port", "9200").option("es.nodes.wan.only", "true").load("bbb")
df.printSchema()
the output only shows:
root
|-- text: string (nullable = true)
|-- title: string (nullable = true)
The embedding
(dense_vector
) field is completely missing.
My Questions
-
Is
dense_vector
officially unsupported in the ES-Hadoop connector? -
The documentation on supported field mappings doesn’t mention vector types. Does that mean they are silently ignored?
-
Is there any workaround to read these fields into Spark (e.g., as arrays of floats), or is duplicating the field into a regular
float
array the only option?
Thanks in advance for clarifying!