Hi
im scrapping product pages where each product has a description and multi image. generate vector for each image. finally i want search similar images with a given image and return products similar with that image.
what is best design for this?
create a complete document for each image with product data?(e.g. having 5 image means having 5 document with equal description?)
or create a document with nested image name and its vector in main document?
I don't know much about search performance in nested types with vectors. But thinking about this and also about document size, because you can have several images for a product and this can increase the size of the document, I would go for an option with two indexes.
There would be the product index and the other index with the vectorized images for each product (if there are 5 images for a product, there would be 5 documents). This would allow managing two indexes separately both in data ingestion and in search.
For vector search, you use the image index and would enrich the result with the data from the product index.
two title for desire. first is big data. denormalization. second is performance. in each search we must for one time select one index then join it with other index.
then whats better?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.