Is it possible to get a vector embedding for an image from an inference endpoint using the sentence-transformers__clip-vit-b-32-multilingual-v1 model? And if so, what format should the input be?
The documentation on the inference api tells me that the input should be a string so I suspect that this is not possible. But perhaps a base64 encoded image might work?
I'm not sure how I would test this myself, because if I give it a base64 encoded string it will give me a vector, but I have no way of knowing if the vector is for the image or the text.
Not it's not possible yet. I hope to see that happening at some point as I'd love to run inference on my audio files directly within elastic instead of writing my own Python code
I would be also interested to use inference API to get a vector embedding for the image file. I tried to use sentence-transformers__clip-vit-b-32-multilingual-v1 model locally within python service, but I am struggling to wrap that service into docker container. The result docker image is very big and it takes forever to deploy it into kubernetes cluster.
Do you know if that is in the roadmap and if yes, any idea when it will be available ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.