Default language for language detection

droberts195 · December 20, 2021, 3:01pm

Thanks for pointing out that deficiency.

I agree it's an issue and have opened [ML] What to do about lang_ident for empty strings and numbers? · Issue #81933 · elastic/elasticsearch · GitHub to discuss what to do.

In the short term you could add an extra set processor after the lang_ident inference processor that changes the predicted language field to en if the source field is an empty string. This set processor would use an if so that it only overrides the prediction for empty strings.

Topic		Replies	Views
Language Identification in Elastic Cloud Elasticsearch	3	408	October 20, 2020
Language detection Kibana	2	224	June 23, 2022
Language Detector Processor in Elasticsearch Ingest pipeline Elasticsearch	4	1094	April 25, 2018
Does elasticsearch-6.8.4 support language detection? Elasticsearch	4	428	January 20, 2020
Failure: [lang_ident_neural_network] model could not find non-null numerical array named [embedding_vector] Elasticsearch elastic-stack-machine-learning , ingest-pipeline	2	348	October 15, 2021

Default language for language detection

Related topics