Synonyms not working on 0.17.8?


(ppearcy) #1

Hello,
I am running 0.17.8 in our dev environment and 0.16.5 in QA and
production. I am doing some work around synonyms and noticed that they
don't seem to be working on 0.17.8 or 0.17.7 (I had updated to .8 to
check if this had been fixed).

Here is a gist that shows what I am observing:

I was testing around punctuation in the synonym list, but even the
simple case of work <=> business doesn't work on 0.17.8.

Please let me know if there are any other details I can provide.
Hopefully, I'm not missing something obvious.

Thanks,
Paul


(ppearcy) #2

FYI, this was caused by updates in lucene3.4 to handle synonym files
more efficiently. I made some tweaks and sent a pull request:

As a side note, it'd be very nice to define a list of token filters to
apply to the synonym list to ensure the behavior is in sync with the
analysis that has occurred up to that point on the data.

Thanks,
Paul

On Oct 11, 6:32 pm, ppearcy ppea...@gmail.com wrote:

Hello,
I am running 0.17.8 in our dev environment and 0.16.5 in QA and
production. I am doing some work around synonyms and noticed that they
don't seem to be working on 0.17.8 or 0.17.7 (I had updated to .8 to
check if this had been fixed).

Here is a gist that shows what I am observing:https://gist.github.com/1279879

I was testing around punctuation in the synonym list, but even the
simple case of work <=> business doesn't work on 0.17.8.

Please let me know if there are any other details I can provide.
Hopefully, I'm not missing something obvious.

Thanks,
Paul


#3

On Wed, Oct 12, 2011 at 3:47 AM, ppearcy ppearcy@gmail.com wrote:

FYI, this was caused by updates in lucene3.4 to handle synonym files

This is not true. lucene did not have a multiword synonym capability before 3.4

--
lucidimagination.com


(ppearcy) #4

Yes, should have really said this was caused by elasticsearch updates
to pull in lucene 3.4.

Robert, do you have any further details on multiword synonym support?
Is the link below the best reference or are there others?
http://lucene.apache.org/java/3_4_0/api/all/org/apache/lucene/analysis/synonym/SynonymFilter.html

Was that added with this feature or is there another ticket tracking?
https://issues.apache.org/jira/browse/LUCENE-3233

Thanks,
Paul

On Oct 12, 6:38 am, Robert Muir rcm...@gmail.com wrote:

On Wed, Oct 12, 2011 at 3:47 AM, ppearcy ppea...@gmail.com wrote:

FYI, this was caused by updates in lucene3.4 to handle synonym files

This is not true. lucene did not have a multiword synonym capability before 3.4

--
lucidimagination.com


(Shay Banon) #5

Grr, ugly bug. The fix is actually simpler, commented on the pull request.

On Wed, Oct 12, 2011 at 9:47 AM, ppearcy ppearcy@gmail.com wrote:

FYI, this was caused by updates in lucene3.4 to handle synonym files
more efficiently. I made some tweaks and sent a pull request:
https://github.com/elasticsearch/elasticsearch/pull/1386

As a side note, it'd be very nice to define a list of token filters to
apply to the synonym list to ensure the behavior is in sync with the
analysis that has occurred up to that point on the data.

Thanks,
Paul

On Oct 11, 6:32 pm, ppearcy ppea...@gmail.com wrote:

Hello,
I am running 0.17.8 in our dev environment and 0.16.5 in QA and
production. I am doing some work around synonyms and noticed that they
don't seem to be working on 0.17.8 or 0.17.7 (I had updated to .8 to
check if this had been fixed).

Here is a gist that shows what I am observing:
https://gist.github.com/1279879

I was testing around punctuation in the synonym list, but even the
simple case of work <=> business doesn't work on 0.17.8.

Please let me know if there are any other details I can provide.
Hopefully, I'm not missing something obvious.

Thanks,
Paul


(Lukáš Vlček) #6

Hi,

not an expert here but I tried to look quickly into Lucene 3.4/Solr code and
it seems to me that synonym filter code has been ported from Solr to Lucene.
This means that probably also Solr wiki applies as a documentation:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

I haven't tried it myself yet.

HTH,

Regards,
Lukas

On Wed, Oct 12, 2011 at 6:21 PM, ppearcy ppearcy@gmail.com wrote:

Yes, should have really said this was caused by elasticsearch updates
to pull in lucene 3.4.

Robert, do you have any further details on multiword synonym support?
Is the link below the best reference or are there others?

http://lucene.apache.org/java/3_4_0/api/all/org/apache/lucene/analysis/synonym/SynonymFilter.html

Was that added with this feature or is there another ticket tracking?
https://issues.apache.org/jira/browse/LUCENE-3233

Thanks,
Paul

On Oct 12, 6:38 am, Robert Muir rcm...@gmail.com wrote:

On Wed, Oct 12, 2011 at 3:47 AM, ppearcy ppea...@gmail.com wrote:

FYI, this was caused by updates in lucene3.4 to handle synonym files

This is not true. lucene did not have a multiword synonym capability
before 3.4

--
lucidimagination.com


#7

On Thu, Oct 13, 2011 at 6:42 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,
not an expert here but I tried to look quickly into Lucene 3.4/Solr code and
it seems to me that synonym filter code has been ported from Solr to Lucene.

This is not true. This was rewritten from scratch.

--
lucidimagination.com


(Lukáš Vlček) #8

Hi Robert,

thanks for clarification!

So if I understand it correctly, the Solr's SynonymFilterFactory delegates
either to SlowSynonymFilter (if using Lucene < 3.4) or to SynonymFilter (via
FSTSynonymFilterFactory) which is located in Lucene module. And that Lucene
module was written from scratch, correct? Which means that SynonymFilter
JavaDoc mentioned by @ppearcy is probably the best source of up-to-date
documentation about synonym filter for now (speaking from the side of
ElasticSearch user).

Lukas

On Thu, Oct 13, 2011 at 3:07 PM, Robert Muir rcmuir@gmail.com wrote:

On Thu, Oct 13, 2011 at 6:42 AM, Lukáš Vlček lukas.vlcek@gmail.com
wrote:

Hi,
not an expert here but I tried to look quickly into Lucene 3.4/Solr code
and
it seems to me that synonym filter code has been ported from Solr to
Lucene.

This is not true. This was rewritten from scratch.

--
lucidimagination.com


#9

On Thu, Oct 13, 2011 at 9:48 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi Robert,
thanks for clarification!
So if I understand it correctly, the Solr's SynonymFilterFactory delegates
either to SlowSynonymFilter (if using Lucene < 3.4) or to SynonymFilter (via
FSTSynonymFilterFactory) which is located in Lucene module. And that Lucene
module was written from scratch, correct? Which means that SynonymFilter
JavaDoc mentioned by @ppearcy is probably the best source of up-to-date
documentation about synonym filter for now (speaking from the side of
ElasticSearch user).

exactly. Lucene didnt have a synonymfilter before (Except a very
limited single-word one in the wordnet package, replaced by a wordnet
PARSER for this filter)

But, Solr had a synonymfilter before. there are a couple of corner
cases where it didnt make sense to try to emulate the old solr
functionality exactly for backwards
compatibility, so instead we did this delegation trick so that solr
users can continue to use the exact version they had before in those
cases.

for Lucene users its all new functionality... and javadocs are really
the documentation for lucene, since its a library.

--
lucidimagination.com


(system) #10