I am working on semantic search, where i try to save processed results of scraped website data. Website has title, summary, sub_section_headings and sub_section_data. i am storing title & summary as dense vectors
and sub_section_headings & sub_section_data as nested dense vectors
[sub_section_data has more vectors than sub_section_headings]. my mapping is as below [using Elasticsearch version 7.10]
"mappings": {
"dynamic": "strict",
"properties": {
"title_vector": {"type": "dense_vector","dims": 768},
"summary_vector": {"type": "dense_vector","dims": 768},
"sub_section_headings_vectors": {
"type": "nested",
"properties": {"vector": {"type": "dense_vector","dims": 768}}
},
"sub_section_data_vectors": {
"type": "nested",
"properties": {"vector": {"type": "dense_vector", "dims": 768}}
}
}}
Please suggest a method to search all these fields at a time, or at least multiple nested vectors.
my current query is as below
# for nested fields, sub_section_headings & sub_section_data
'query': {
'nested': {'path': 'sub_section_headings',
'score_mode': 'max',
'query': {'function_score': {'script_score': {'script': {
'source': ''(1.0+cosineSimilarity(params.query_vector, params.field))'',
'params': {'field': 'sub_section_headings.vector', 'query_vector': [0.32, 0.89, ...]}}}}}
}
}
# for title & summary
'query': {
'script_score': {
'query': {'exists': {'field': 'title_vector'}},
'script': {
'source': '(1.0+cosineSimilarity(params.query_vector, params.field))',
'params': {'field': 'title_vector', 'query_vector': [0.32, 0.89, ...]}
}
}
}
Currently i am doing 4 searches, one each for all the fields. , How can i optimise? Please help