[BUG] Re: Highlighting not working with synonym filter


(vineeth mohan-2) #1

Hi , I strongly feel this as a bug.
Should i open a bug in the rep ?

Thanks
Vineeth

On Wed, Oct 23, 2013 at 6:36 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Steps to reproduce -

Index creation script

  1. Download http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz ,
    extract wn_s.pl and place it in config directory.
  2. Create an index using this script -
    https://gist.github.com/Vineeth-Mohan/7118283
  3. ./AboveScript localhost
  4. Index a feed
    curl -XPOST 'http://localhost:9200/events/news' -d ' { "Events" : {
    "Event" : "large forest in large" } }'
  5. Execute this search - https://gist.github.com/Vineeth-Mohan/7118312
  6. Output obtained - https://gist.github.com/Vineeth-Mohan/7118329

Observation - Received "large forest in
large" rather than "large forest in
large" for searching big ( big is present in the synonym
file)

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #2

Just making sure I understand this: you are correctly seeing highlights of
synonyms but incorrectly seeing highlights on the word after the synonym?
You never searched the word "forest" but it is highlighted any way, correct?

Before you file it I'd try searching for "forest" with an term filter on
just that "large forest in large" document's id to see what it highlights.
It ought to just highlight forest. If it doesn't then there is either a
problem with the wordnet file or how it is parsed.

Nik

On Thu, Oct 24, 2013 at 1:23 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hi , I strongly feel this as a bug.
Should i open a bug in the rep ?

Thanks
Vineeth

On Wed, Oct 23, 2013 at 6:36 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Steps to reproduce -

Index creation script

  1. Download http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz ,
    extract wn_s.pl and place it in config directory.
  2. Create an index using this script -
    https://gist.github.com/Vineeth-Mohan/7118283
  3. ./AboveScript localhost
  4. Index a feed
    curl -XPOST 'http://localhost:9200/events/news' -d ' { "Events" : {
    "Event" : "large forest in large" } }'
  5. Execute this search - https://gist.github.com/Vineeth-Mohan/7118312
  6. Output obtained - https://gist.github.com/Vineeth-Mohan/7118329

Observation - Received "large forest in
large" rather than "large forest in
large" for searching big ( big is present in the synonym
file)

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #3

Hello Nikolas ,

Searching forest , i am getting the below highlight -

  • large forest in large

And for searching big or large

  • large forest in large

So conclusion is that on matching a term which don't have a synonym ,
things work fine
But while doing it on a term with synonym , every indexed term is getting
highlighting.

Few more example -
Text - the lake was frozen and high , all my team was at their best. Lets
do the best then

Search word - lake
Result - the lake was frozen and high , all my team was at
their best. Lets do the best then

Search word - high
Result - the lake was frozen and high , all
my team was at their best. Lets do the best then
Question - How did the term "all" , "my" and "best" got highlighted ?
Conclusion - Seems every indexed term is not getting highlighted. But there
is no pattern in which terms are highlighted. May be all terms having
synonym in the text are highlighted !!!

Search word - my
Result - the lake was frozen and high , all my team was at
their best. Lets do the best then
Conclusion - Above conclusion is wrong. If all terms with synonyms were
highlighted on any synonym match , it should have happened for my also.

Also i don't find any issue with analyser or the wordnet. You can see the
analyser output for the text "large forest" here -

As far as i can see , all the tokens are correctly identified and placed. I
feel this is a bug with highlighter.

Conclusion - I am not finding a pattern to this bug. Can i go ahead and
file an issue ?

Thanks
Vineeth

On Fri, Oct 25, 2013 at 7:02 PM, Nikolas Everett nik9000@gmail.com wrote:

Just making sure I understand this: you are correctly seeing highlights of
synonyms but incorrectly seeing highlights on the word after the synonym?
You never searched the word "forest" but it is highlighted any way, correct?

Before you file it I'd try searching for "forest" with an term filter on
just that "large forest in large" document's id to see what it highlights.
It ought to just highlight forest. If it doesn't then there is either a
problem with the wordnet file or how it is parsed.

Nik

On Thu, Oct 24, 2013 at 1:23 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hi , I strongly feel this as a bug.
Should i open a bug in the rep ?

Thanks
Vineeth

On Wed, Oct 23, 2013 at 6:36 PM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Steps to reproduce -

Index creation script

  1. Download http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz ,
    extract wn_s.pl and place it in config directory.
  2. Create an index using this script -
    https://gist.github.com/Vineeth-Mohan/7118283
  3. ./AboveScript localhost
  4. Index a feed
    curl -XPOST 'http://localhost:9200/events/news' -d ' { "Events" : {
    "Event" : "large forest in large" } }'
  5. Execute this search -
    https://gist.github.com/Vineeth-Mohan/7118312
  6. Output obtained - https://gist.github.com/Vineeth-Mohan/7118329

Observation - Received "large forest in
large" rather than "large forest in
large" for searching big ( big is present in the synonym
file)

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #4

If file it with that much information, yes.

Sent from my iPhone

On Oct 26, 2013, at 1:19 AM, vineeth mohan vm.vineethmohan@gmail.com wrote:

Hello Nikolas ,

Searching forest , i am getting the below highlight -
large forest in large

And for searching big or large
large forest in large

So conclusion is that on matching a term which don't have a synonym , things work fine
But while doing it on a term with synonym , every indexed term is getting highlighting.

Few more example -
Text - the lake was frozen and high , all my team was at their best. Lets do the best then

Search word - lake
Result - the lake was frozen and high , all my team was at their best. Lets do the best then

Search word - high
Result - the lake was frozen and high , all my team was at their best. Lets do the best then
Question - How did the term "all" , "my" and "best" got highlighted ?
Conclusion - Seems every indexed term is not getting highlighted. But there is no pattern in which terms are highlighted. May be all terms having synonym in the text are highlighted !!!

Search word - my
Result - the lake was frozen and high , all my team was at their best. Lets do the best then
Conclusion - Above conclusion is wrong. If all terms with synonyms were highlighted on any synonym match , it should have happened for my also.

Also i don't find any issue with analyser or the wordnet. You can see the analyser output for the text "large forest" here - https://gist.github.com/Vineeth-Mohan/7165559

As far as i can see , all the tokens are correctly identified and placed. I feel this is a bug with highlighter.

Conclusion - I am not finding a pattern to this bug. Can i go ahead and file an issue ?

Thanks
Vineeth

On Fri, Oct 25, 2013 at 7:02 PM, Nikolas Everett nik9000@gmail.com wrote:
Just making sure I understand this: you are correctly seeing highlights of synonyms but incorrectly seeing highlights on the word after the synonym? You never searched the word "forest" but it is highlighted any way, correct?

Before you file it I'd try searching for "forest" with an term filter on just that "large forest in large" document's id to see what it highlights. It ought to just highlight forest. If it doesn't then there is either a problem with the wordnet file or how it is parsed.

Nik

On Thu, Oct 24, 2013 at 1:23 PM, vineeth mohan vm.vineethmohan@gmail.com wrote:
Hi , I strongly feel this as a bug.
Should i open a bug in the rep ?

Thanks
Vineeth

On Wed, Oct 23, 2013 at 6:36 PM, vineeth mohan vm.vineethmohan@gmail.com wrote:

Steps to reproduce -

Index creation script

Download http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz , extract wn_s.pl and place it in config directory.
Create an index using this script - https://gist.github.com/Vineeth-Mohan/7118283
./AboveScript localhost
Index a feed
curl -XPOST 'http://localhost:9200/events/news' -d ' { "Events" : { "Event" : "large forest in large" } }'
Execute this search - https://gist.github.com/Vineeth-Mohan/7118312
Output obtained - https://gist.github.com/Vineeth-Mohan/7118329
Observation - Received "large forest in large" rather than "large forest in large" for searching big ( big is present in the synonym file)

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #5

Filed a bug - https://github.com/elasticsearch/elasticsearch/issues/3982

On Sat, Oct 26, 2013 at 6:16 PM, Nik Everett nik9000@gmail.com wrote:

If file it with that much information, yes.

Sent from my iPhone

On Oct 26, 2013, at 1:19 AM, vineeth mohan vm.vineethmohan@gmail.com
wrote:

Hello Nikolas ,

Searching forest , i am getting the below highlight -

  • large forest in large

And for searching big or large

  • large forest in large

So conclusion is that on matching a term which don't have a synonym ,
things work fine
But while doing it on a term with synonym , every indexed term is getting
highlighting.

Few more example -
Text - the lake was frozen and high , all my team was at their best. Lets
do the best then

Search word - lake
Result - the lake was frozen and high , all my team was at
their best. Lets do the best then

Search word - high
Result - the lake was frozen and high , all
my team was at their best. Lets do the best then
Question - How did the term "all" , "my" and "best" got highlighted ?
Conclusion - Seems every indexed term is not getting highlighted. But
there is no pattern in which terms are highlighted. May be all terms having
synonym in the text are highlighted !!!

Search word - my
Result - the lake was frozen and high , all my team was at
their best. Lets do the best then
Conclusion - Above conclusion is wrong. If all terms with synonyms were
highlighted on any synonym match , it should have happened for my also.

Also i don't find any issue with analyser or the wordnet. You can see the
analyser output for the text "large forest" here -
https://gist.github.com/Vineeth-Mohan/7165559

As far as i can see , all the tokens are correctly identified and placed.
I feel this is a bug with highlighter.

Conclusion - I am not finding a pattern to this bug. Can i go ahead and
file an issue ?

Thanks
Vineeth

On Fri, Oct 25, 2013 at 7:02 PM, Nikolas Everett nik9000@gmail.comwrote:

Just making sure I understand this: you are correctly seeing highlights
of synonyms but incorrectly seeing highlights on the word after the
synonym? You never searched the word "forest" but it is highlighted any
way, correct?

Before you file it I'd try searching for "forest" with an term filter on
just that "large forest in large" document's id to see what it highlights.
It ought to just highlight forest. If it doesn't then there is either a
problem with the wordnet file or how it is parsed.

Nik

On Thu, Oct 24, 2013 at 1:23 PM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Hi , I strongly feel this as a bug.
Should i open a bug in the rep ?

Thanks
Vineeth

On Wed, Oct 23, 2013 at 6:36 PM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Steps to reproduce -

Index creation script

  1. Download http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
    , extract wn_s.pl and place it in config directory.
  2. Create an index using this script -
    https://gist.github.com/Vineeth-Mohan/7118283
  3. ./AboveScript localhost
  4. Index a feed
    curl -XPOST 'http://localhost:9200/events/news' -d ' { "Events" :
    { "Event" : "large forest in large" } }'
  5. Execute this search -
    https://gist.github.com/Vineeth-Mohan/7118312
  6. Output obtained - https://gist.github.com/Vineeth-Mohan/7118329

Observation - Received "large forest in
large" rather than "large forest in
large" for searching big ( big is present in the synonym
file)

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6