Custom stop words in default_search


(nurikabe) #1

Hi,

We are trying to set a custom list of stop words for default search by
adding the following to elasticsearch.yml:

index:
analysis:
analyzer:
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase, trim]
stopwords: [email, telephone]

In other words, although we do want ElasticSearch to index tokens like
"email" and "telephone", we want to prohibit searches on these words by
default.

Querying:

{
"query": {
"query_string": {
"query": "email",
"analyzer": "default_search",
"fields": [
"file"
]
}
}
}

however pulls up all results that match "email". I notice that if we query
on pre-defined stopwords such as "by" the query returns no results.

Am I doing something wrong in my stopwords definition that would cause my
custom list to be ignored and defaults to be used instead?

Thanks


(Shay Banon) #2

You don't have a stop words filter defined in teh filter list. What you
need to do is default a custom filter based on the stop token filter (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html),
name it something like "my_stop", and then list it in the list of filters
for default_search.

On Wed, Mar 14, 2012 at 7:51 PM, nurikabe eaowens@gmail.com wrote:

Hi,

We are trying to set a custom list of stop words for default search by
adding the following to elasticsearch.yml:

index:
analysis:
analyzer:
default_search:
tokenizer: uax_url_email
filter: [standard, lowercase, trim]
stopwords: [email, telephone]

In other words, although we do want ElasticSearch to index tokens like
"email" and "telephone", we want to prohibit searches on these words by
default.

Querying:

{
"query": {
"query_string": {
"query": "email",
"analyzer": "default_search",
"fields": [
"file"
]
}
}
}

however pulls up all results that match "email". I notice that if we
query on pre-defined stopwords such as "by" the query returns no results.

Am I doing something wrong in my stopwords definition that would cause my
custom list to be ignored and defaults to be used instead?

Thanks


(system) #3