Synonym behavior with highlighting and multiple terms


(ppearcy) #1

Hey,
I'm seeing a couple of issues that I have pointed out in this gist:

The first issue that appears to be a bug is that highlighting can be
incorrect in the case of multi-term synonyms. The returned data for
the 2nd result is:
atripla

I reproduced this with term_vectors no and with_positions_offsets, so
appears to affect both highlighting paths.

Next, it appears that synonym expansion occurs to the query string,
which gives some unexpected results, as all the terms in the synonym
list are injected into the query. So, looking at the 2nd result again,
atripla is returned because it is a synonym with "efavirenz
emtricitabine" and truvada is a synonym with "emtricitabine
tenofovir", which doesn't seem to be the correct behavior. I
understand why this is occurring, the synonym term expansion is
occurring both index side and query side, but that doesn't seem right.
When I specify to use an analyzer without the synonym filter, this
goes away.

I am going to play around more with expand=false and see if the
behavior is any better.

Thanks,
Paul


(Shay Banon) #2

I will have a look at it and try and see where the problem is. Just a quick
note, you can configure a different index and search analyzer on a field (or
explicitly an analyzer when you search).

On Wed, Oct 26, 2011 at 8:48 PM, ppearcy ppearcy@gmail.com wrote:

Hey,
I'm seeing a couple of issues that I have pointed out in this gist:
https://gist.github.com/1317149

The first issue that appears to be a bug is that highlighting can be
incorrect in the case of multi-term synonyms. The returned data for
the 2nd result is:
atripla

I reproduced this with term_vectors no and with_positions_offsets, so
appears to affect both highlighting paths.

Next, it appears that synonym expansion occurs to the query string,
which gives some unexpected results, as all the terms in the synonym
list are injected into the query. So, looking at the 2nd result again,
atripla is returned because it is a synonym with "efavirenz
emtricitabine" and truvada is a synonym with "emtricitabine
tenofovir", which doesn't seem to be the correct behavior. I
understand why this is occurring, the synonym term expansion is
occurring both index side and query side, but that doesn't seem right.
When I specify to use an analyzer without the synonym filter, this
goes away.

I am going to play around more with expand=false and see if the
behavior is any better.

Thanks,
Paul


(ppearcy) #3

FYI, setting the search analyzer not to have the synonym filter and
only having this applied index time, does help with the poor search
results due to the search time synonym expansion.

However, the highlighting issue is still present. I opened a ticket to
track:

I hope to find some time to debug this and will update the ticket with
any details I find out.

Thanks,
Paul

On Oct 26, 1:45 pm, Shay Banon kim...@gmail.com wrote:

I will have a look at it and try and see where the problem is. Just a quick
note, you can configure a different index and search analyzer on a field (or
explicitly an analyzer when you search).

On Wed, Oct 26, 2011 at 8:48 PM, ppearcy ppea...@gmail.com wrote:

Hey,
I'm seeing a couple of issues that I have pointed out in this gist:
https://gist.github.com/1317149

The first issue that appears to be a bug is that highlighting can be
incorrect in the case of multi-term synonyms. The returned data for
the 2nd result is:
atripla

I reproduced this with term_vectors no and with_positions_offsets, so
appears to affect both highlighting paths.

Next, it appears that synonym expansion occurs to the query string,
which gives some unexpected results, as all the terms in the synonym
list are injected into the query. So, looking at the 2nd result again,
atripla is returned because it is a synonym with "efavirenz
emtricitabine" and truvada is a synonym with "emtricitabine
tenofovir", which doesn't seem to be the correct behavior. I
understand why this is occurring, the synonym term expansion is
occurring both index side and query side, but that doesn't seem right.
When I specify to use an analyzer without the synonym filter, this
goes away.

I am going to play around more with expand=false and see if the
behavior is any better.

Thanks,
Paul


(ppearcy) #4

FYI, now hitting a case where the search is actually failing. My guess
is that this and the other issue I mentioned in this thread are
related. To reproduce:

On Nov 7, 12:13 pm, ppearcy ppea...@gmail.com wrote:

FYI, setting the search analyzer not to have thesynonymfilter and
only having this applied index time, does help with the poor search
results due to the search timesynonymexpansion.

However, the highlighting issue is still present. I opened a ticket to
track:https://github.com/elasticsearch/elasticsearch/issues/1444

I hope to find some time to debug this and will update the ticket with
any details I find out.

Thanks,
Paul

On Oct 26, 1:45 pm, Shay Banon kim...@gmail.com wrote:

I will have a look at it and try and see where the problem is. Just a quick
note, you can configure a different index and search analyzer on a field (or
explicitly an analyzer when you search).

On Wed, Oct 26, 2011 at 8:48 PM, ppearcy ppea...@gmail.com wrote:

Hey,
I'm seeing a couple of issues that I have pointed out in this gist:
https://gist.github.com/1317149

The first issue that appears to be a bug is that highlighting can be
incorrect in the case of multi-term synonyms. The returned data for
the 2nd result is:
atripla

I reproduced this with term_vectors no and with_positions_offsets, so
appears to affect both highlighting paths.

Next, it appears thatsynonymexpansion occurs to the query string,
which gives some unexpected results, as all the terms in thesynonym
list are injected into the query. So, looking at the 2nd result again,
atripla is returned because it is asynonymwith "efavirenz
emtricitabine" and truvada is asynonymwith "emtricitabine
tenofovir", which doesn't seem to be the correct behavior. I
understand why this is occurring, thesynonymterm expansion is
occurring both index side and query side, but that doesn't seem right.
When I specify to use an analyzer without thesynonymfilter, this
goes away.

I am going to play around more with expand=false and see if the
behavior is any better.

Thanks,
Paul


(system) #5