here is a small plugin for the upcoming 0.20 Elasticsearch release. It is a
language detector plugin, a revamped version of the implementation of
Nakatani Shuyo's language detector at http://code.google.com/p/language-detection/
It offers a REST endpoint where a short text can be posted to in UTF-8, and
Elasticsearch responds with a list of recognized languages.
Currently, it does not provide automatic language-aware indexing. It is
just for your convenience. You have to evaluate the response by yourself
and take the appropriate language-dependent action.
Note: it does not work on Elasticsearch 0.19 because of API changes in the
REST action implementation.
here is a small plugin for the upcoming 0.20 Elasticsearch release. It is a language detector plugin, a revamped version of the implementation of Nakatani Shuyo's language detector at http://code.google.com/p/language-detection/
The plugin is available at https://github.com/jprante/elasticsearch-langdetect
It offers a REST endpoint where a short text can be posted to in UTF-8, and Elasticsearch responds with a list of recognized languages.
Currently, it does not provide automatic language-aware indexing. It is just for your convenience. You have to evaluate the response by yourself and take the appropriate language-dependent action.
Note: it does not work on Elasticsearch 0.19 because of API changes in the REST action implementation.
Le vendredi 23 novembre 2012 02:21:59 UTC+1, Jörg Prante a écrit :
Hi,
here is a small plugin for the upcoming 0.20 Elasticsearch release. It is
a language detector plugin, a revamped version of the implementation of
Nakatani Shuyo's language detector at http://code.google.com/p/language-detection/
It offers a REST endpoint where a short text can be posted to in UTF-8,
and Elasticsearch responds with a list of recognized languages.
Currently, it does not provide automatic language-aware indexing. It is
just for your convenience. You have to evaluate the response by yourself
and take the appropriate language-dependent action.
Note: it does not work on Elasticsearch 0.19 because of API changes in the
REST action implementation.
Le vendredi 23 novembre 2012 02:21:59 UTC+1, Jörg Prante a écrit :
Hi,
here is a small plugin for the upcoming 0.20 Elasticsearch release. It is a language detector plugin, a revamped version of the implementation of Nakatani Shuyo's language detector at http://code.google.com/p/language-detection/
The plugin is available at https://github.com/jprante/elasticsearch-langdetect
It offers a REST endpoint where a short text can be posted to in UTF-8, and Elasticsearch responds with a list of recognized languages.
Currently, it does not provide automatic language-aware indexing. It is just for your convenience. You have to evaluate the response by yourself and take the appropriate language-dependent action.
Note: it does not work on Elasticsearch 0.19 because of API changes in the REST action implementation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.