Problem configuring PatternReplaceFilter


(Alexander Reelsen) #1

Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
analysis:
analyzer:
default:
type: ae_analyzer

  ae_analyzer:
    type: custom
    tokenizer: standard
    filter: [umlaut_replace]

filter:
  umlaut_replace:
    type : pattern_replace
    pattern: "ä"
    replacement: "a"

The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?

Regards, Alexander


(Shay Banon) #2

Can you do a get settings to see if the type is really there for the filter
(note, settings get munged into key value pairs)? Also, for this usecase,
though I would love to help fixing it, you might want to consider using the
asciifolding filter? (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html
).

On Mon, Aug 1, 2011 at 7:08 PM, Alexander Reelsen <
alexander.reelsen@googlemail.com> wrote:

Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
analysis:
analyzer:
default:
type: ae_analyzer

 ae_analyzer:
   type: custom
   tokenizer: standard
   filter: [umlaut_replace]

filter:
umlaut_replace:
type : pattern_replace
pattern: "ä"
replacement: "a"

The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?

Regards, Alexander


(Alexander Reelsen) #3

Hi,

Completely my fault. I tested against a 0.16 version of elasticsearch,
where the filter was not included yet. Works smoothly with 0.17. Sorry
for that.

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.

Thanks for helping, going to hide ashamed behind a rock now :slight_smile:

--Alexander


(Ivan Brusic) #4

Aha! That explains the situation I was experiencing the other day after
upgrading. I assumed it was due to the zip file being wrongly named.

--
Ivan

On Tue, Aug 2, 2011 at 7:28 AM, Alexander Reelsen <
alexander.reelsen@googlemail.com> wrote:

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.


(Jan Fiedler) #5

Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents (
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html).
This way you would not have to define explicit mappings for every character
and would automatically cover other common cases such as accented chars
(like in Créme).


(Alexander Reelsen) #6

Hi Jan,

On 3 Aug., 09:05, Jan Fiedler fiedler....@gmail.com wrote:

Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents (http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analys...).
Right. As far as I know this works only, if you want to create a from
ä... in some special cases you might want to create "ae" instead

--Alexander


(Jan Fiedler) #7

Yeah, if you are looking for the 'ä' -> 'ae' you may find the following
thread helpful (
http://elasticsearch-users.115913.n3.nabble.com/Folding-German-characters-like-umlauts-td2176078.html).
I have not tried the German2 stemmer myself. Based on pure Lucene (2.x back
then) I relied on the synonym approach described in the thread.


(system) #8