Hello, I need to retrieve documents using a "join" property.
Let's explain:
I have 2 kind of documents in my index : "test"
images and chunks, they are formatted like this
{
"type":"image" or "chunk",
"content":"some textual content",
"imageVector":[the image dense vector],
"textVector":[the chunk dense vector],
"path":"a textual path",
"decription":"a textual description",
"joinKey":"a unique id that some images shared with some chunks"
}
Warn : They don't have a parent-child relation.
I want to make an hybrid search (syntax and vector search) on my images (so filtering on the "type" field ="image") :
the syntax part of this search as to match the "path" field (with a boost :0.8) and some keywords in the "description" field (with a boost:0.2).
the vectorsearch part is a knn search based on the "imageVector" field.
I want to weight the syntax search part to 0.3 and the knn part to 0.7.
BUT I want the result of this hybrid search is not images themselves but the associated chunks through the 'joinKey' field they have in common.
Is there a way to achieve that in a single one search?
Thank you for your help
(PS : I use javascript (or can test in the kibana console)
the chunks are litteral piece of text stored in the "content" field. Here is a example of an image record
{
"type":"image",
"content":"some textual content describing the image",
"imageVector":[1,2,-6,9,1,0,....], => the vectorized image using models like CLIP
"textVector":[0.2,-0.6,0.8...], => the vectorized field "content" of the image record
"path":"my_path",
"decription":"my_image_12",
"joinKey":"1234" => the record id of a "text" type record
}
, this record has id "4567".
Here is a example of the linked "text" record
{
"type":"text",
"content":"blablabla", => the chunk part, textual text of a document
"imageVector":empty,
"textVector":[0.6,-0.4,0.7...], => the vectorized field "content" of the textrecord
"path":"my_path",
"decription":"some_description",
"joinKey":empty
}
, this record has id "1234"
So my hybrid search retrieve the image record id "4567" (because its path and description fields matches "syntax" criteria and the field imageVector match the knn criteria) and thanks to the "joinKey" field I want to return to the user the record id "1234" (instead of "4567")
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.