With 0.16 the highlight output changed a little bit:
Elastica_Query_HighlightTest::testHightlightSearch
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array
(
[email] => Array
(
[0] => <em class="highlight">test@test.com</em>
[0] => <em class="highlight">test</em>@<em
class="highlight">test.com
)
)
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
With 0.16 the highlight output changed a little bit:
[0] => <em class="highlight">test@test.com</em>
[0] => <em class="highlight">test</em>@<em
class="highlight">test.com
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
I think it isn't the highlighting that has changed, but the default
analyzer which now breaks up email addresses into several terms. Before,
email addresses produced a single term.
Yes, thats the new behavior in Lucene 3.1. You can now specify a Lucene version on tokenizer/analyzer/... to revert to the old behavior.
On Thursday, April 28, 2011 at 11:14 AM, Clinton Gormley wrote:
Hi Ruflin
With 0.16 the highlight output changed a little bit:
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
I think it isn't the highlighting that has changed, but the default
analyzer which now breaks up email addresses into several terms. Before,
email addresses produced a single term.
I did not have chance to get myself fully familiar with all new Lucene 3.1
analyzers but as far as I understand it is possible to create token filters
specifically for emails, urls and paths based on uax_url_email tokenizer. Is
this directly exposed in ES 0.16 ?
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
I think it isn't the highlighting that has changed, but the default
analyzer which now breaks up email addresses into several terms. Before,
email addresses produced a single term.
I did not have chance to get myself fully familiar with all new Lucene
3.1 analyzers but as far as I understand it is possible to create
token filters specifically for emails, urls and paths based on
uax_url_email tokenizer. Is this directly exposed in ES 0.16 ?
I was probably not clear, what I was asking about is if there is any option
how to configure email filter. This means you have a text which contains
several email addresses and the output would be only those email addresses.
That is what that Lucene test does.
The one with only 1 was in version 0.15.2, the output with
multiple is in 0.16. Is this change as expected?
I think it isn't the highlighting that has changed, but the default
analyzer which now breaks up email addresses into several terms. Before,
email addresses produced a single term.
There isn't a built in EmailFilter, which is used in the test just to make sure the relevant token type is used.
On Thursday, April 28, 2011 at 3:48 PM, Lukáš VlÄek wrote:
I was probably not clear, what I was asking about is if there is any option how to configure email filter. This means you have a text which contains several email addresses and the output would be only those email addresses. That is what that Lucene test does.
I think those filters could be useful. For example if I am interested in
emails or URLs and I am not interested in the stuffed text content
inbetween. Just an idea for nice-to-have feature.
There isn't a built in EmailFilter, which is used in the test just to
make sure the relevant token type is used.
On Thursday, April 28, 2011 at 3:48 PM, Lukáš Vlček wrote:
I was probably not clear, what I was asking about is if there is any option
how to configure email filter. This means you have a text which contains
several email addresses and the output would be only those email addresses.
That is what that Lucene test does.
Sounds good.
On Thursday, April 28, 2011 at 5:52 PM, Lukáš VlÄek wrote:
Shay,
I think those filters could be useful. For example if I am interested in emails or URLs and I am not interested in the stuffed text content inbetween. Just an idea for nice-to-have feature.
There isn't a built in EmailFilter, which is used in the test just to make sure the relevant token type is used.
On Thursday, April 28, 2011 at 3:48 PM, Lukáš VlÄek wrote:
I was probably not clear, what I was asking about is if there is any option how to configure email filter. This means you have a text which contains several email addresses and the output would be only those email addresses. That is what that Lucene test does.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.