Problem with keyword_repeat filter

Manuel_Gellfart · May 6, 2013, 2:03pm

Hello,

I am experiencing a problem (maybe bug?) with the new keyword_repeat filter.

The Documentation
http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-repeat-tokenfilter/says:

The keyword_repeat token filter Emits each incoming token twice once as

keyword and once as a non-keyword to allow an un-stemmed version of a term
to be indexed side by site to the stemmed version of the term.

Thats exactly what I want. So i defined a small test Analyzer "test":

settings: {
index: {
analysis: {
analyzer: {
bcsTicketAnalyzer2: {
type: 'custom',
tokenizer: 'whitespace',
filter: [
'lowercase',
'keyword_repeat',
'replacePattern'

        ]
      }

},
filter: {
replacePattern: {
type: 'pattern_replace',
pattern: '[!"#$%&'()*+,./:;<=>?@^_`{|}~-]',
replacement: ' '
}
}
}
}
}

Elastic Search accepts this setting and I tested it with a single word
"F-I-TS".
I Expected something like this as result:

{
"tokens" : [ {
"token" : "f-i-ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

But I get this as Result:
{
"tokens" : [ {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

This means, that both keyword and non-keyword get postprocessed in the same way.
How can I achieve my expected result? This is important for my usecase because people sometimes search for company names.

Thanks in Advance

Manuel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Trivial example of keyword_repeat? Elasticsearch	2	427	July 6, 2017
Unique tokenfilter issues? Elasticsearch	2	415	July 6, 2017
Comparison of tokens must not be repeated from query side to index document side Elasticsearch	1	382	August 27, 2019
Custom analyzer: keyword_marker Elasticsearch	1	450	July 6, 2017
Keyword tokenizer Elasticsearch	4	317	July 6, 2017

Problem with keyword_repeat filter

Related topics