Using a Python Tokenizer

chbrown · January 30, 2018, 4:02pm

Hi there!

Wondering if anyone has tried using a custom tokenizer written in python?

What are some possible approaches? I'm guessing it would need to be added via a custom plugin?

We currently have one offered to us for segmenting a few Asian languages, and are wondering if its possible to use with elasticsearch.

Chris

Ivan · January 30, 2018, 4:23pm

Custom analysis code/plugins are run within the same JVM used by
Elasticsearch (jwith a separate classloader), so the code must be written
in a JVM language. This limitation is due to the fact that the code is
required by the Lucene library, which is written in Java. I have written
plugins in Java and Scala. Python only works as a scripting language.
Someone please correct me if I am wrong, there always seems to be a
workaround.

chbrown · January 30, 2018, 4:32pm

Thanks Ivan! Everything I've found online is telling me the same.

system · February 27, 2018, 4:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch and plugins Elasticsearch	2	363	July 6, 2017
Settings Configuration for Tokenizers/Analyzers Elasticsearch	2	243	July 6, 2017
New language - Custom analyzer plugin or token filter Elasticsearch	1	541	March 21, 2017
Issues creating custom_analyzer Elasticsearch	4	399	September 13, 2019
Custom tokenizer in .NET language, push already tokenized text into ES Elasticsearch	2	354	July 6, 2017

Using a Python Tokenizer

Related topics