Hello,
Recently I attended a Vector & AI workshop where I delved into the intricacies of vector embeddings. However, a demo on image relevance search caught my attention. Now, my goal is to embed images directly into our data pipeline during the ingestion process.
My primary query revolves around the logistics of embedding images during ingestion. As I embark on creating an inference pipeline, a fundamental question arises: how do I ingest the image in the first place? Is encoding the bytes to base64, akin to how HTML files and PDFs are handled with the attachment processor, the optimal approach?
Initially, I turned to the guidance provided in the "How to implement image similarity search in Elasticsearch" blog post. However, the suggested method involves embedding images using a Python script, which doesn't align with our current cloud/self-managed deployment setup. We're keen on embedding images directly during data ingestion, avoiding the need for additional backend processing.
Thus, my query persists: Is there a viable method to embed images during ingestion without relying on an external backend? Any insights or alternative methodologies would be greatly appreciated as we strive for a smoother integration of image embedding within our data pipeline.
Thanks,
Chenko