Using a Python Tokenizer

Hi there!

Wondering if anyone has tried using a custom tokenizer written in python?

What are some possible approaches? I'm guessing it would need to be added via a custom plugin?

We currently have one offered to us for segmenting a few Asian languages, and are wondering if its possible to use with elasticsearch.


Custom analysis code/plugins are run within the same JVM used by
Elasticsearch (jwith a separate classloader), so the code must be written
in a JVM language. This limitation is due to the fact that the code is
required by the Lucene library, which is written in Java. I have written
plugins in Java and Scala. Python only works as a scripting language.
Someone please correct me if I am wrong, there always seems to be a

Thanks Ivan! Everything I've found online is telling me the same.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.