Hi,
I'm trying to load vector_value into elastic search,
and loading finished without any error.
But the result of mapping shows type "float", not "dense_vector".
The version of Elasticsearch is 7.5.0.
{
"test" : {
"mappings" : {
"properties" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"text_feature" : {
"type" : "float"
}
}
}
}
}
I'm using following python code for loading.
import csv
import os
import sys
from elasticsearch import Elasticsearch, helpers
import yaml
from my_vectorizer import MySentenceVectorizer
VEC = MyVectorizer()
def create_index(index='test'):
es = Elasticsearch()
setting = yaml.load(open('./mapping.yaml'), Loader=yaml.SafeLoader)
properties = setting['mappings']['properties']
def generate_data():
with open('./text.csv', 'r') as f:
reader = csv.reader(f)
attrs = next(reader)
for lid, row in enumerate(reader):
data = {
'_op_type': 'index',
'_index': index,
}
for j, value in enumerate(row):
if attrs[j] in properties:
data[attrs[j]] = value
if attrs[j] == 'text':
data['text_vector'] = VEC.vectorize(value).tolist()
yield data
print(helpers.bulk(es, generate_data()))
create_index()
The type of variable " VEC.vectorize(value).tolist()" is surely list object of float.
And my yaml settings is as follows.
settings:
index:
analysis:
analyzer:
my_analyzer:
type: custom
tokenizer: kuromoji_tokenizer
filter:
- kuromoji_baseform
mappings:
properties:
text:
type: text
index: true
fielddata: true
analyzer: my_analyzer
text_feature:
type: dense_vector
dims: 768
The vector value is generated by BERT vectorizer.
I'm completely stuck in this situation.
I hope someone help me.