Hello,
Is there any way where i can detect the different languages present in the message or text through ingest pipeline ?
Regards,,
Rohan
Hello,
Is there any way where i can detect the different languages present in the message or text through ingest pipeline ?
Regards,,
Rohan
There is no out of the box language detection functionality in Elasticsearch, but the good news is that there is an open source plugin available at: https://github.com/spinscale/elasticsearch-ingest-langdetect. This plugin provides a processor which you can use in a pipeline to detect the language of a field in your documents.
While it's not an official plugin, it has been built by @spinscale, who is an Elasticsearch developer.
minor limitation - right now the plugin only returns the language with the highest probability. There is no chunking happening, which then is able to find different languages in one field.
But the plugin is open source, so feel free to provide patches or fork it, if you need to extend it!
--Alex
Thanks @abdon & @spinscale for the information !
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.