I just occurred to me as I was testing things that the standard filter
does NOT remove punctuation. My custom filter in Lucene was stripping
punctuation, not the standard filter.
I was able to remove punctuation by using a mapping char_filter. Mine
simply removes dots '.'
type : custom
filter : [unique , standard, asciifolding, lowercase,]
char_filter : [punctuation]
On Fri, May 25, 2012 at 11:41 PM, Michael Sick
I'm still having no luck on this. I've created a more self contained example
for the behavior. In short, I'm storing a document with a field containing:
"P.F. Changs Burgers"
Create & Run Test: https://gist.github.com/2792582
Delete Artifacts: https://gist.github.com/2792590
I'd like ES to provide a match if I search on "p.f.", "p.f", "pf.", "pf"
with regard to case. Currently only the 1st two work. My approach relies on
using the Synonym filter for translating all forms above to "pf". I'd be
happy to fix this approach or, even better, to learn that there's a general
approach that will not require as much configuration.
On Sun, May 20, 2012 at 4:35 PM, Ivan Brusic firstname.lastname@example.org wrote:
The standard filter should remove punctuation from tokens.
You can use the analysis API to view the differences between analyzers
(and unfortunately not between tokenizers or filters). The Lucene in
Action book has a summary of the different classes.
On Sat, May 19, 2012 at 8:14 AM, Michael Sick
What's the best way (or tradeoffs) to exclude punctuation (or specific
characters) from certain fields during analysis and searching?
In document, "P.F. Changs" would match a search for "P.F. Changs" or "PF
It looks like I could do this with the Synonym filter and the ICU
Are there better options? Which is best or what are the tradeoffs?
Overall, if anyone knows of any resources that compare/contrast the
analyzers/filters/..., it would be very helpful.
Thanks for any advice/pointers,