I notice is that phrase searches containing stop words will never matc
any document. For example a field or term search for "Bank of
America" would have no hits. I understand why this is, but it's
useless to try to explain it to a user.
Is there a reasonable solution without indexing stop words?
I tried one way, but it does not seem to work. I thought if I could
get a list of stop words in use by the current analyzer, I could
remove them from search phrases. So if the user searches for "Bank of
America" I would remove "of" and search for the phrase "Bank
America". Because the analyzer had removed "of" I was thinking that
ES would actually see "Bank" and "America" as adjacent words and match
on the "Bank America" search phrase. It seems this does not work.
Perhaps the index leaves a "hole" where the stop word was and so does
not see "Bank" and "America" as adjacent?
If the aforementioned solution did actually work, then a search for
"Bank of America" would also match instances of "Bank and America",
"Bank the America" and so forth. I can live with that, no problem.
By the way, I seem to recall a recent thread on this topic, but I
could not find it, thus a new thread.
Sorry for the late response, I must have missed your mail. In 0.16.2, the new text family of queries should help, give it a go.
-shay.banon
On Saturday, January 22, 2011 at 9:23 PM, Tim Scott wrote:
I notice is that phrase searches containing stop words will never matc
any document. For example a field or term search for "Bank of
America" would have no hits. I understand why this is, but it's
useless to try to explain it to a user.
Is there a reasonable solution without indexing stop words?
I tried one way, but it does not seem to work. I thought if I could
get a list of stop words in use by the current analyzer, I could
remove them from search phrases. So if the user searches for "Bank of
America" I would remove "of" and search for the phrase "Bank
America". Because the analyzer had removed "of" I was thinking that
ES would actually see "Bank" and "America" as adjacent words and match
on the "Bank America" search phrase. It seems this does not work.
Perhaps the index leaves a "hole" where the stop word was and so does
not see "Bank" and "America" as adjacent?
If the aforementioned solution did actually work, then a search for
"Bank of America" would also match instances of "Bank and America",
"Bank the America" and so forth. I can live with that, no problem.
By the way, I seem to recall a recent thread on this topic, but I
could not find it, thus a new thread.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.