Language Detector Processor in Elasticsearch Ingest pipeline

Hello,

Is there any way where i can detect the different languages present in the message or text through ingest pipeline ?

Regards,,
Rohan

There is no out of the box language detection functionality in Elasticsearch, but the good news is that there is an open source plugin available at: https://github.com/spinscale/elasticsearch-ingest-langdetect. This plugin provides a processor which you can use in a pipeline to detect the language of a field in your documents.

While it's not an official plugin, it has been built by @spinscale, who is an Elasticsearch developer.

minor limitation - right now the plugin only returns the language with the highest probability. There is no chunking happening, which then is able to find different languages in one field.

But the plugin is open source, so feel free to provide patches or fork it, if you need to extend it!

--Alex

Thanks @abdon & @spinscale for the information !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.