Avoiding stop word results


(drahmel7) #1

I had a query that was getting some strange results and I believe it
was caused by a stop word. The query looks like this:

http://localhost:9200/myindex/mytype/_search?q=(is_video:1%20AND%20title_kw:will%20)&pretty=1

This returns 166K results. If I just do +title_kw:will -- then it
properly returns zero results (there are no videos with "will" in the
title).

This query works properly if the title_kw is something like "meat"
which returns 32 articles.

However, if I do the same query with title_kw:the -- then it returns
the same 166K results as "will".

This makes me think "will" is a stop word and it is being presumed to
be true in a boolean search.

Is this a correct assumption? Is there any way to query ES for the
stop word list?

If this is true, can you do this query somehow to avoid the huge non-
relevant response?

Dan Rahmel
http://www.socialtodolist.com


(Shay Banon) #2

Yes, will is a stop word. Here is the list of default english stopwords:

"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"

On Tuesday, March 6, 2012 at 10:25 PM, danr wrote:

I had a query that was getting some strange results and I believe it
was caused by a stop word. The query looks like this:

http://localhost:9200/myindex/mytype/_search?q=(is_video:1%20AND%20title_kw:will%20)&pretty=1

This returns 166K results. If I just do +title_kw:will -- then it
properly returns zero results (there are no videos with "will" in the
title).

This query works properly if the title_kw is something like "meat"
which returns 32 articles.

However, if I do the same query with title_kw:the -- then it returns
the same 166K results as "will".

This makes me think "will" is a stop word and it is being presumed to
be true in a boolean search.

Is this a correct assumption? Is there any way to query ES for the
stop word list?

If this is true, can you do this query somehow to avoid the huge non-
relevant response?

Dan Rahmel
http://www.socialtodolist.com


(system) #3