I did not have chance to get myself fully familiar with all new Lucene 3.1
analyzers but as far as I understand it is possible to create token filters
specifically for emails, urls and paths based on uax_url_email tokenizer. Is
this directly exposed in ES 0.16 ?
Looking at the original rufin's code that would be the best solution IMHO
(as he is having email addresses in the text). See EmailFilter in
On Thu, Apr 28, 2011 at 1:37 PM, Shay Banon email@example.com:
Yes, thats the new behavior in Lucene 3.1. You can now specify a Lucene
version on tokenizer/analyzer/... to revert to the old behavior.
On Thursday, April 28, 2011 at 11:14 AM, Clinton Gormley wrote:
With 0.16 the highlight output changed a little bit:
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
I think it isn't the highlighting that has changed, but the default
analyzer which now breaks up email addresses into several terms. Before,
email addresses produced a single term.