Configuring a custom plugin


(edwarddale) #1

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound


(Shay Banon) #2

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?
On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound


(edwarddale) #3

The FilterFactory is being instantiated a number of times, only one of which
picks up the word list setting. The create method is only being called once
on one of those instances, and not the one with the configured word list.
I've confirmed this by some low-tech debug statements in the FilterFactory
class.

Cheers,
Edward

kimchy wrote:

The filter should be created only once per index created, are you saying
that it gets created several times, and some of them without the parameter
passed?
On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1] https://github.com/scompt/elasticsearch/tree/compound

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Configuring-a-custom-plugin-tp2281753p2285851.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(edwarddale) #4

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound


(Shay Banon) #5

Strange that its being picked up only once. I will have a look...
On Wednesday, January 19, 2011 at 4:42 PM, Edward Dale wrote:

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound


(Shay Banon) #6

I just pushed a simplification for token filters that require settings. If you look at master, check the new PhoneticTokenFilterFactory, it also requires an encoder to be set. It is annotated with @AnalysisSettingsRequired to indicate that it requires settings so ES will not pre cache it if it doesn't have them set.
On Thursday, January 20, 2011 at 6:26 PM, Shay Banon wrote:

Strange that its being picked up only once. I will have a look...
On Wednesday, January 19, 2011 at 4:42 PM, Edward Dale wrote:

The FilterFactory is being instantiated a number of times, only one of
which picks up the word list setting. The create method is only being
called once on one of those instances, and not the one with the
configured word list. I've confirmed this by some low-tech debug
statements in the FilterFactory class.

Cheers,
Edward

On Jan 18, 7:40 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The filter should be created only once per index created, are you saying that it gets created several times, and some of them without the parameter passed?

On Tuesday, January 18, 2011 at 7:23 PM, Edward Dale wrote:

Hello,

I've developed a custom plugin that wraps the
DictionaryCompoundWordTokenFilterFactory provided by Lucene. This
TokenFilter requires a word list file, which is provided through the
settings. After talking with kimchy yesterday, I made this setting
optional so that it can be instantiated by ES without throwing an
exception. This is done beforehand to speed up the document indexing
process.

My problem now is that there are a number of instances of my Filter
instantiated and only one of them has the path to the word list file
configured. Unfortunately, when I index or analyze a document, an
instance without the word list is used to analyze.

Does anyone know to configure a custom plugin that requires a
parameter?

I've pushed my changes to github [1] and this is the ES configuration
that I'm trying to use. For some reason, I can't use
'dictionary_decompounder' as the type for the filter, but that's a
separate problem.

index :
analysis :
analyzer :
myAnalyzer2 :
tokenizer : standard
filter : [dict1, standard, lowercase, stop]
filter :
dict1 :
type :
org.elasticsearch.index.analysis.DictionaryCompoundWordTokenFilterFactory
word_list_path : /Users/Edward/Downloads/word.german

Thanks!
Edward

[1]https://github.com/scompt/elasticsearch/tree/compound


(system) #7