Can ELSER be used with languages other than English

Chenko · January 16, 2024, 4:18pm

Hello,

I'm posting to gather insights about ELSER's language capabilities, particularly its support for languages other than English. My interest lies in understanding how well ELSER handles the following languages:

Dutch (NL)
French (FR)
German (DE)

If ELSER is currently not equipped to support these languages, I'd be curious to learn about any future plans. Specifically, are there any timelines or stages in development aimed at integrating Dutch, French, and German (a roadmap would be ideal) ?

Insights into the challenges and strategies for adapting ELSER to these languages would be highly appreciated.

On the other hand, if ELSER already boasts multilingual support, I'd love to know more about its performance. How does it fare in terms of accuracy and relevance when processing Dutch, French, and German compared to English? I'm keen on understanding any particular strengths or limitations ELSER might exhibit in these languages.

Thanks,
Chenko Mortier

camillo · January 21, 2024, 3:44pm

I second this and am interested in the performance in for Italian (IT), as far as I've seen ELSER is english only and the E5 model should be used instead, however, I can only find ELSER in my trained model console

valeriy42 · January 22, 2024, 9:39am

Hello,

indeed, ELSER should be used for English documents, while for other languages we recommend using multilingual e5. Note, that starting with 8.12 we support multilingual-e5-small as a first-class citizen.

camillo · January 22, 2024, 9:56am

Thank you, I can't find the E3 model in my trained models but I can find the Elser model, what could be the problem?

Chenko · January 22, 2024, 1:47pm

Hello, I can see the E5 model (not the E3) when I check my local dev env with elastic 8.12. It seems only E5 is supported. You can import the E3 model with eland, that should work ?

Chenko · January 22, 2024, 1:49pm

Thanks for your response. Do you have any more information on how this compares with ELSER ?

Also, will future versions allow better models ? (Since I see this model has "-small" , making me think it is not the best)

camillo · January 22, 2024, 2:01pm

yea sorry I misspelled it, I meant the E5 model, I can only see two versions of elser and the language detection model, maybe it's because I'm on elastic 8.11 ? I run in an Elastic Cloud hosted deployment

Chenko · January 22, 2024, 2:13pm

Hello, Yes it seems to be introduced in the recently released 8.12.0. I suggest upgrading so you can use it without using eland.

JdKock · January 25, 2024, 6:07am

I'm very interested in this multilingual-e5 model and want to try it out. But what field type do I add to my index template for the embedding?

I now tried the following:

 "content_embedding": { 
	"type": "dense_vector", 
	"dims": 3, 
	"element_type": "byte",
	"similarity": "dot_product" 
}

But when I'm try to reindex I get the following error:

failed to parse: element_type [byte] vectors only support non-decimal values but found decimal value [0.058786135] at dim [0]

dkyle · January 25, 2024, 11:32am

Element type should be float rather than byte as the model produces a float embedding. Set dims to the size of the text embedding, for multilingual-e5-small that is 384

"dims": 384
"element_type": "float",

JdKock · January 25, 2024, 12:36pm

Thanks that helped. Now I can reindex everything.

And what about searching now, can I just use the following like I did with ELSER:

{
	"text_expansion": {
		"content_embedding": {
			"model_text": "just a simple search query",
			"model_id": ".multilingual-e5-small_linux-x86_64",
			"boost": 1
		}
	}
}

dkyle · January 25, 2024, 1:02pm

For text embeddings use knn search

There's an example here - click on Dense Vector Models

system · February 22, 2024, 1:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.