Can ELSER be used with languages other than English

Hello,

I'm posting to gather insights about ELSER's language capabilities, particularly its support for languages other than English. My interest lies in understanding how well ELSER handles the following languages:

  • Dutch (NL)
  • French (FR)
  • German (DE)

If ELSER is currently not equipped to support these languages, I'd be curious to learn about any future plans. Specifically, are there any timelines or stages in development aimed at integrating Dutch, French, and German (a roadmap would be ideal) ?

Insights into the challenges and strategies for adapting ELSER to these languages would be highly appreciated.

On the other hand, if ELSER already boasts multilingual support, I'd love to know more about its performance. How does it fare in terms of accuracy and relevance when processing Dutch, French, and German compared to English? I'm keen on understanding any particular strengths or limitations ELSER might exhibit in these languages.

Thanks,
Chenko Mortier

I second this and am interested in the performance in for Italian (IT), as far as I've seen ELSER is english only and the E5 model should be used instead, however, I can only find ELSER in my trained model console

Hello,

indeed, ELSER should be used for English documents, while for other languages we recommend using multilingual e5. Note, that starting with 8.12 we support multilingual-e5-small as a first-class citizen.

Thank you, I can't find the E3 model in my trained models but I can find the Elser model, what could be the problem?

Hello, I can see the E5 model (not the E3) when I check my local dev env with elastic 8.12. It seems only E5 is supported. You can import the E3 model with eland, that should work ?

Thanks for your response. Do you have any more information on how this compares with ELSER ?

Also, will future versions allow better models ? (Since I see this model has "-small" , making me think it is not the best)

yea sorry I misspelled it, I meant the E5 model, I can only see two versions of elser and the language detection model, maybe it's because I'm on elastic 8.11 ? I run in an Elastic Cloud hosted deployment

Hello, Yes it seems to be introduced in the recently released 8.12.0. I suggest upgrading so you can use it without using eland.

I'm very interested in this multilingual-e5 model and want to try it out. But what field type do I add to my index template for the embedding?

I now tried the following:

 "content_embedding": { 
	"type": "dense_vector", 
	"dims": 3, 
	"element_type": "byte",
	"similarity": "dot_product" 
}

But when I'm try to reindex I get the following error:

failed to parse: element_type [byte] vectors only support non-decimal values but found decimal value [0.058786135] at dim [0]

Element type should be float rather than byte as the model produces a float embedding. Set dims to the size of the text embedding, for multilingual-e5-small that is 384

"dims": 384
"element_type": "float",

Thanks that helped. Now I can reindex everything.

And what about searching now, can I just use the following like I did with ELSER:

{
	"text_expansion": {
		"content_embedding": {
			"model_text": "just a simple search query",
			"model_id": ".multilingual-e5-small_linux-x86_64",
			"boost": 1
		}
	}
}

For text embeddings use knn search

There's an example here - click on Dense Vector Models

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.