I've played around with the min_similarity quite a bit and it is currently
set to 0.7 so this isn't the reason I suppose. But thanks for the hint
regarding the explanation. For the query "gmbh" (which is a stopword) it
returns the following
{"value":0.625,"description":"fieldWeight(names:gmbh in 4993), product of:",
"details":[{"value":1,"description":"tf(termFreq(names:gmbh)=1)"},{"value":1
,"description":"idf(docFreq=40, maxDocs=7428)"},{"value":0.625,"description"
:"fieldNorm(field=names, doc=4993)"}]}
I don't really get the meaning of this but it seems that there ARE hits on
gmbh as if it weren't a stopword although the _settings show that gmbh is
properly configured
"index.analysis.filter.my_stopwords.stopwords.19" : "gmbh"
I am getting nuts here ![]()
On Tuesday, January 29, 2013 4:01:00 PM UTC+1, Martijn v Groningen wrote:
I think the reason you see other companies that have these stopwords
is not because you had a match on the stop filter, but because the
fuzzy_like_thisquery has a lowmin_similarity. I suggest that you
increase it to something like0.7You have to test and play around
with it a bit.If you really think that you have a hit on a stopword (which I don't
think happens) you can verify this the explain option:
Elasticsearch Platform — Find real-time answers at scale | ElasticMartijn
On 29 January 2013 13:41, Haensel <theha...@gmail.com <javascript:>>
wrote:Thanks for your help but it still doesn't work
Here is the new query:
"query":{"bool":
{"should":[
{"query_string":{"query":"testgmbh","fuzzy_prefix_length":3,"default_operator":"AND","default_field":"names"}},
{"fuzzy_like_this":{"fields":["names"],"boost":1,"like_text":"test
gmbh","min_similarity":0.2,"prefix_length":0,"max_query_terms":25}}]
}}
}Here's an example: When searching for a company that was indexed as
"TestCompany" a query like "TestComp Limited" (notice the "misspelled"
company name) will find a lot of "Limited" companies while "TestCompany"
is
the string I am really interested in. I simply want "TestCompany" (or a
similar name via fuzzy search) to be first in the list, effectively
ignoring
all legal forms defined in my stopwords list.On Tuesday, January 29, 2013 1:12:59 PM UTC+1, Martijn v Groningen
wrote:Hi Hannes,
You're not specifying a field for both the
query_stringand
fuzzy_like_thisqueries. The behaviour is then to use the_all
field.
I think if you specify thenamesfield then it should work. For the
query_stringquery use thedefault_fieldoption and for the
fuzzy_like_thisyou need to use thefieldsoption to use the your
namesfield.Martijn
On 29 January 2013 11:37, Haensel theha...@gmail.com wrote:
Hi,
I am using a combination of a query string (user searches via a
Search" like textbox) and a fuzzy query to be able to find misspelled
names
etc. Maybe the fuzzy search makes the stopwords useless? And if so,
would
there be a way around that?{"bool":
{"should":[
{"query_string":{"query":"gmbh","default_operator":"AND"}},{"fuzzy_like_this":{"boost":1,"like_text":"gmbh","min_similarity":0.5,"prefix_length":0,"max_query_terms":25}}
]}
}Thanks,
Hannes
On Tuesday, January 29, 2013 9:40:57 AM UTC+1, Martijn v Groningen
wrote:Hi Hannes,
How does the query look like?
Martijn
On 28 January 2013 20:31, Haensel theha...@gmail.com wrote:
Hi,
I have the following problem: I have a list of company names but
want
to
exclude the "form of the organization" (like Limited, LLC etc.) by
using
my
own stopwords filter. So far so good, but I can't get it to work.
It
does
the indexing, everything is searchable, but when searching for
"LLC"
etc
I
am still getting matches. Here is my config (I am using PHP syntax
here,
but
I guess the values are obvious):'analysis' => array(
'analyzer' => array(
'name_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' =>
array('my_stopwords','lowercase','icu_normalizer','ngram')
),
'address_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('standard','ngram')
),
'country_analyzer' => array(
'type' => 'custom',
'tokenizer' => 'lowercase',
'filter' => array('country_synonyms')
),
),
'filter' => array(
'ngram' => array(
'type' => 'nGram',
'min_gram' => 1,
'max_gram' => 5,
),
'country_synonyms' => array(
'type' => 'synonym',
'synonyms' => array('some synonyms that work perfectly')
),
'my_stopwords' => array(
'type' => 'stop',
'stopwords' => array('llc','gmbh',etc.etc.),
'ignore_case' => true
)
)
)And here is my mapping:
'names' => array(
'type' => 'string',
'analyzer' => 'name_analyzer',
'index_analyzer' => 'name_analyzer',
'search_analyzer' => 'name_analyzer',
'include_in_all' => true
),
'addresses' => array(
'dynamic' => false,
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'properties' => array(
'street' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'city' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'state' => array(
'type' => 'string',
'analyzer' => 'address_analyzer',
'index_analyzer' => 'address_analyzer',
'search_analyzer' => 'address_analyzer',
'include_in_all' => true
),
'country' => array(
'type' => 'string',
'analyzer' => 'country_analyzer',
'index_analyzer' => 'country_analyzer',
'search_analyzer' => 'country_analyzer',
'include_in_all' => true
)
)
)A GET request to myindex/_settings shows that the values are set
correctly.
Example:"index.analysis.filter.my_stopwords.stopwords.90":"llc"
I feel pretty lost here. So any help would be really appreciated!
Thanks in advance,
Hannes
--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group, send email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.--
Met vriendelijke groet,Martijn van Groningen
--
Met vriendelijke groet,Martijn van Groningen
--
Met vriendelijke groet,Martijn van Groningen
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.