Configuring a custom plugin

edwarddale · January 18, 2011, 5:23pm

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound

kimchy · January 18, 2011, 6:40pm

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?
On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound

edwarddale · January 19, 2011, 7:41am

The FilterFactory is being instantiated a number of times, only one of which
picks up the word list setting. The create method is only being called once
on one of those instances, and not the one with the configured word list.
I've confirmed this by some low-tech debug statements in the FilterFactory
class.

Cheers,
Edward

kimchy wrote:

The filter should be created only once per index created, are you saying
that it gets created several times, and some of them without the parameter
passed?
On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Configuring-a-custom-plugin-tp2281753p2285851.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

edwarddale · January 19, 2011, 2:42pm

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound

kimchy · January 20, 2011, 4:26pm

Strange that its being picked up only once. I will have a look...
On Wednesday, January 19, 2011 at 4:42 PM, Edward Dale wrote:

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound

kimchy · January 21, 2011, 12:01am

I just pushed a simplification for token filters that require settings. If you look at master, check the new PhoneticTokenFilterFactory, it also requires an encoder to be set. It is annotated with @AnalysisSettingsRequired to indicate that it requires settings so ES will not pre cache it if it doesn't have them set.
On Thursday, January 20, 2011 at 6:26 PM, Shay Banon wrote:

Strange that its being picked up only once. I will have a look...
On Wednesday, January 19, 2011 at 4:42 PM, Edward Dale wrote:

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound

Topic		Replies	Views
Custom TokenFilter Plugin Class Initialization and Parameter Validation Elasticsearch	1	454	April 18, 2017
Can I append a token filter to the custom analyzer plugin? Elasticsearch	1	491	July 5, 2017
Custom Soundex Search Elasticsearch	2	486	July 6, 2017
Proper way to configure a custom analyzer Elasticsearch	3	337	July 6, 2017
Compound Word Token Filter Configuration File Elasticsearch	3	1032	July 5, 2017

Configuring a custom plugin

Related topics