Using a Python Tokenizer

(Chris Brown) #1

Hi there!

Wondering if anyone has tried using a custom tokenizer written in python?

What are some possible approaches? I'm guessing it would need to be added via a custom plugin?

We currently have one offered to us for segmenting a few Asian languages, and are wondering if its possible to use with elasticsearch.


(Ivan Brusic) #2

Custom analysis code/plugins are run within the same JVM used by
Elasticsearch (jwith a separate classloader), so the code must be written
in a JVM language. This limitation is due to the fact that the code is
required by the Lucene library, which is written in Java. I have written
plugins in Java and Scala. Python only works as a scripting language.
Someone please correct me if I am wrong, there always seems to be a

(Chris Brown) #3

Thanks Ivan! Everything I've found online is telling me the same.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.