Hello all,
Is it possible to get a vector embedding for an image from an inference endpoint using the sentence-transformers__clip-vit-b-32-multilingual-v1
model? And if so, what format should the input be?
The documentation on the inference api tells me that the input should be a string so I suspect that this is not possible. But perhaps a base64 encoded image might work?
I'm not sure how I would test this myself, because if I give it a base64 encoded string it will give me a vector, but I have no way of knowing if the vector is for the image or the text.