Stop words filter not working

I've played around with the min_similarity quite a bit and it is currently
set to 0.7 so this isn't the reason I suppose. But thanks for the hint
regarding the explanation. For the query "gmbh" (which is a stopword) it
returns the following

{"value":0.625,"description":"fieldWeight(names:gmbh in 4993), product of:",
"details":[{"value":1,"description":"tf(termFreq(names:gmbh)=1)"},{"value":1
,"description":"idf(docFreq=40, maxDocs=7428)"},{"value":0.625,"description"
:"fieldNorm(field=names, doc=4993)"}]}

I don't really get the meaning of this but it seems that there ARE hits on
gmbh as if it weren't a stopword although the _settings show that gmbh is
properly configured

"index.analysis.filter.my_stopwords.stopwords.19" : "gmbh"

I am getting nuts here :slight_smile:

On Tuesday, January 29, 2013 4:01:00 PM UTC+1, Martijn v Groningen wrote:

I think the reason you see other companies that have these stopwords
is not because you had a match on the stop filter, but because the
fuzzy_like_this query has a low min_similarity. I suggest that you
increase it to something like 0.7 You have to test and play around
with it a bit.

If you really think that you have a hit on a stopword (which I don't
think happens) you can verify this the explain option:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Martijn

On 29 January 2013 13:41, Haensel <theha...@gmail.com <javascript:>>
wrote:

Thanks for your help but it still doesn't work :frowning: Here is the new query:

"query":{"bool":
{"should":[
{"query_string":{"query":"test

gmbh","fuzzy_prefix_length":3,"default_operator":"AND","default_field":"names"}},

{"fuzzy_like_this":{"fields":["names"],"boost":1,"like_text":"test

gmbh","min_similarity":0.2,"prefix_length":0,"max_query_terms":25}}]
}}
}

Here's an example: When searching for a company that was indexed as
"TestCompany" a query like "TestComp Limited" (notice the "misspelled"
company name) will find a lot of "Limited" companies while "TestCompany"
is
the string I am really interested in. I simply want "TestCompany" (or a
similar name via fuzzy search) to be first in the list, effectively
ignoring
all legal forms defined in my stopwords list.

On Tuesday, January 29, 2013 1:12:59 PM UTC+1, Martijn v Groningen
wrote:

Hi Hannes,

You're not specifying a field for both the query_string and
fuzzy_like_this queries. The behaviour is then to use the _all
field.
I think if you specify the names field then it should work. For the
query_string query use the default_field option and for the
fuzzy_like_this you need to use the fields option to use the your
names field.

Martijn

On 29 January 2013 11:37, Haensel theha...@gmail.com wrote:

Hi,

I am using a combination of a query string (user searches via a
"Google
Search" like textbox) and a fuzzy query to be able to find misspelled
names
etc. Maybe the fuzzy search makes the stopwords useless? And if so,
would
there be a way around that?

{"bool":
{"should":[
{"query_string":{"query":"gmbh","default_operator":"AND"}},

{"fuzzy_like_this":{"boost":1,"like_text":"gmbh","min_similarity":0.5,"prefix_length":0,"max_query_terms":25}}

]}
}

Thanks,

Hannes

On Tuesday, January 29, 2013 9:40:57 AM UTC+1, Martijn v Groningen
wrote:

Hi Hannes,

How does the query look like?

Martijn

On 28 January 2013 20:31, Haensel theha...@gmail.com wrote:

Hi,

I have the following problem: I have a list of company names but
want
to
exclude the "form of the organization" (like Limited, LLC etc.) by
using
my
own stopwords filter. So far so good, but I can't get it to work.
It
does
the indexing, everything is searchable, but when searching for
"LLC"
etc
I
am still getting matches. Here is my config (I am using PHP syntax
here,
but
I guess the values are obvious):

'analysis' => array(
'analyzer' => array(
'name_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' =>
array('my_stopwords','lowercase','icu_normalizer','ngram')
),
'address_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('standard','ngram')
),
'country_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'lowercase',
'filter' => array('country_synonyms')
),
),
'filter' => array(
'ngram' => array(
'type' => 'nGram',
'min_gram' => 1,
'max_gram' => 5,
),
'country_synonyms' => array(
'type' => 'synonym',
'synonyms' => array('some synonyms that work perfectly')
),
'my_stopwords' => array(
'type' => 'stop',
'stopwords' => array('llc','gmbh',etc.etc.),
'ignore_case' => true
)
)
)

And here is my mapping:

'names' => array(
'type' => 'string',
'analyzer' => 'name_analyzer',
'index_analyzer' => 'name_analyzer',
'search_analyzer' => 'name_analyzer',
'include_in_all' => true
),
'addresses' => array(
'dynamic' => false,
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'properties' => array(
'street' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'city' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'state' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'country' => array(
'type' => 'string',
'analyzer' => 'country_analyzer',
'index_analyzer' => 'country_analyzer',
'search_analyzer' => 'country_analyzer',
'include_in_all' => true
)
)
)

A GET request to myindex/_settings shows that the values are set
correctly.
Example:

"index.analysis.filter.my_stopwords.stopwords.90":"llc"

I feel pretty lost here. So any help would be really appreciated!

Thanks in advance,

Hannes

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group, send email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.