Problem configuring PatternReplaceFilter

Alexander_Reelsen · August 1, 2011, 4:08pm

Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
analysis:
analyzer:
default:
type: ae_analyzer

  ae_analyzer:
    type: custom
    tokenizer: standard
    filter: [umlaut_replace]

filter:
  umlaut_replace:
    type : pattern_replace
    pattern: "ä"
    replacement: "a"

The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?

Regards, Alexander

kimchy · August 1, 2011, 9:03pm

Can you do a get settings to see if the type is really there for the filter
(note, settings get munged into key value pairs)? Also, for this usecase,
though I would love to help fixing it, you might want to consider using the
asciifolding filter? (

).

On Mon, Aug 1, 2011 at 7:08 PM, Alexander Reelsen <
alexander.reelsen@googlemail.com> wrote:

Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
analysis:
analyzer:
default:
type: ae_analyzer
 ae_analyzer:
   type: custom
   tokenizer: standard
   filter: [umlaut_replace]
filter:
umlaut_replace:
type : pattern_replace
pattern: "ä"
replacement: "a"

The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?

Regards, Alexander

Alexander_Reelsen · August 2, 2011, 11:28am

Hi,

Completely my fault. I tested against a 0.16 version of elasticsearch,
where the filter was not included yet. Works smoothly with 0.17. Sorry
for that.

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.

Thanks for helping, going to hide ashamed behind a rock now

--Alexander

Ivan · August 2, 2011, 12:54pm

Aha! That explains the situation I was experiencing the other day after
upgrading. I assumed it was due to the zip file being wrongly named.

--
Ivan

On Tue, Aug 2, 2011 at 7:28 AM, Alexander Reelsen <
alexander.reelsen@googlemail.com> wrote:

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.

Jan_Fiedler · August 3, 2011, 7:05am

Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents (
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html).
This way you would not have to define explicit mappings for every character
and would automatically cover other common cases such as accented chars
(like in Créme).

Alexander_Reelsen · August 3, 2011, 7:14am

Hi Jan,

On 3 Aug., 09:05, Jan Fiedler fiedler....@gmail.com wrote:

Maybe off topic but maybe helpful anyway: Instead of using the
PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
automatically converts lots of non ASCII characters (such as German umlauts)
into their ASCII equivalents (http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analys...).
Right. As far as I know this works only, if you want to create a from
ä... in some special cases you might want to create "ae" instead

--Alexander

Jan_Fiedler · August 3, 2011, 7:27am

Yeah, if you are looking for the 'ä' -> 'ae' you may find the following
thread helpful (
http://elasticsearch-users.115913.n3.nabble.com/Folding-German-characters-like-umlauts-td2176078.html).
I have not tried the German2 stemmer myself. Based on pure Lucene (2.x back
then) I relied on the synonym approach described in the thread.

Topic		Replies	Views
Pattern_replace char filter regex Elasticsearch	2	707	June 28, 2017
Solr convert having problems with pattern_replace filter Elasticsearch	2	320	January 25, 2019
Unknown char_filter type [pattern_replace] Elasticsearch	8	2536	April 27, 2018
Pattern Replace Character Filter In a Normalizer Elasticsearch	1	584	May 27, 2021
How to configure a pattern_capture on a specific token type? Elasticsearch	1	381	July 6, 2017

Problem configuring PatternReplaceFilter

Related topics