Hello,
I am experiencing a problem (maybe bug?) with the new keyword_repeat filter.
The Documentation
http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-repeat-tokenfilter/says:
The keyword_repeat token filter Emits each incoming token twice once as
keyword and once as a non-keyword to allow an un-stemmed version of a term
to be indexed side by site to the stemmed version of the term.
Thats exactly what I want. So i defined a small test Analyzer "test":
settings: {
index: {
analysis: {
analyzer: {
bcsTicketAnalyzer2: {
type: 'custom',
tokenizer: 'whitespace',
filter: [
'lowercase',
'keyword_repeat',
'replacePattern'
]
}
},
filter: {
replacePattern: {
type: 'pattern_replace',
pattern: '[!"#$%&'()*+,./:;<=>?@^_`{|}~-]',
replacement: ' '
}
}
}
}
}
Elastic Search accepts this setting and I tested it with a single word
"F-I-TS".
I Expected something like this as result:
{
"tokens" : [ {
"token" : "f-i-ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}
But I get this as Result:
{
"tokens" : [ {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "f i ts",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}
This means, that both keyword and non-keyword get postprocessed in the same way.
How can I achieve my expected result? This is important for my usecase because people sometimes search for company names.
Thanks in Advance
Manuel
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.