Partial word match with singular and plurals: Elasticsearch

Kruti_Shukla · May 1, 2014, 7:48am

My final goal is to have following search precedence:

Exact phrase match
Exact word match with incremental distance
Plurals
Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match
men’s foil shavers <------ exact word match on 2 of 3 words with 0 word
distance + plural
men’s foils shaver <------ exact word match on 2 of 3 words with 1 word
distance + plural
men’s foils shavers <------ exact word match on 1 of 3 words + 2 plurals
men’s shaver <------ exact word match on 2 of 3 words (66% match)
men’s shavers <------ exact word match on 1 of 3 words + plural (66%
match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match
men’s foil shaver <------ exact word match on 2 of 3 words with 0 word
distance + singular
men’s foils shavers <------ exact word match on 2 of 3 words with 1 word
distance + singular
men’s foils shaver <------ exact word match on 1 of 3 words + 2 singulars
men’s shavers <------ exact word match on 2 of 3 words (66% match)
men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match
men’s foils shaver <------ exact word match on 2 of 3 words with 0 word
distance + singular
men’s foil shavers <------ exact word match on 2 of 3 words with 1 word
distance + singular
men’s foil shaver <------ exact word match on 1 of 3 words + 2 singulars
men’s shavers <------ exact word match on 2 of 3 words (66% match)
men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered yet.
Link to my other question "
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ
".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

radu_gheorghe · May 1, 2014, 11:26am

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi
fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor
the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto
wrap all those queries and take the best score (or the best score and
a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set "index" to
"not_anyzed". Alternatively, you could use the simple
analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer
or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top
For phrase matches with distance, you can use the match_phrase type of
the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance for a
match to show up in your results. Documents with "closer" words should get
higher scores. You would boost this query less than the exact matches, but
more than the following.
For handling plurals, you'd probably need to do some stemming. Have a
look at the snowball token
filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor
the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)
For handling substrings, you can use ngrams, as you already seem to be
doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla krutibhatt17@gmail.comwrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match

men’s foil shavers <------ exact word match on 2 of 3 words with 0
word distance + plural

men’s foils shaver <------ exact word match on 2 of 3 words with 1
word distance + plural

men’s foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men’s shaver <------ exact word match on 2 of 3 words (66% match)

men’s shavers <------ exact word match on 1 of 3 words + plural (66%
match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match

men’s foil shaver <------ exact word match on 2 of 3 words with 0 word
distance + singular

men’s foils shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match

men’s foils shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foil shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered yet.
Link to my other question "
Redirecting to Google Groups
".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2EbGEEPrs0Gsf1hyNcyUE_JecusAgwfyR6xdh6RsamcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Kruti_Shukla · May 1, 2014, 12:37pm

Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not
knowing how helpful it can be but now I'm able play with the multi field
feature.
I tried following suggestion and created index and mapping accordingly.

I tried querying for first 2. First one was simple and second one with
slop. It is not returning correct slop(i,e, incremental distance).
Please help/suggest query improvements.

Please see my settings below:

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'

For mappings:
curl -XPUT
"http://localhost:9200/my_improved_index/my_improved_index_type/_mapping"
-d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}

}'

Available documents:

men’s shaver
men’s shavers
```
men’s foil shaver
```
men’s foils shaver
men’s foil shavers
men’s foils shavers
men's foil advanced shaver
norelco men's foil advanced shaver

Query:
curl -XPOST
"http://localhost:9200/my_improved_index/my_improved_index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s shaver",
"slop": 5
}
}
}
]
}
}
}'

Returned result:

men's shaver --> correct
men's shavers --> correct
men's foils shaver --> NOT correct
norelco men's foil advanced shaver --> NOT correct
men's foil advanced shaver --> NOT correct
men's foil shaver --> NOT correct.

Expected result:

men's shaver --> exact phrase match
men's shavers --> ZERO word distance + 1 plural
men's foil shaver --> 1 word distance
men's foils shaver --> 1 word distance + 1 plural
men's foil advanced shaver --> 2 word distance
norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set "index"
to "not_anyzed". Alternatively, you could use the simple analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top

For phrase matches with distance, you can use the match_phrase type of
the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance for
a match to show up in your results. Documents with "closer" words should
get higher scores. You would boost this query less than the exact matches,
but more than the following.

For handling plurals, you'd probably need to do some stemming. Have a
look at the snowball token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)

For handling substrings, you can use ngrams, as you already seem to be
doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla <krutib...@gmail.com<javascript:>

wrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match

men’s foil shavers <------ exact word match on 2 of 3 words with 0
word distance + plural

men’s foils shaver <------ exact word match on 2 of 3 words with 1
word distance + plural

men’s foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men’s shaver <------ exact word match on 2 of 3 words (66% match)

men’s shavers <------ exact word match on 1 of 3 words + plural (66%
match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match

men’s foil shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foils shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match

men’s foils shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foil shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular (66%
match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered yet.
Link to my other question "
Redirecting to Google Groups
".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ddfb4a67-8bfa-4e42-9979-33fab08dcef3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kruti_Shukla · May 2, 2014, 10:40am

Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:

Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not
knowing how helpful it can be but now I'm able play with the multi field
feature.
I tried following suggestion and created index and mapping accordingly.

I tried querying for first 2. First one was simple and second one with
slop. It is not returning correct slop(i,e, incremental distance).
Please help/suggest query improvements.

Please see my settings below:

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'

For mappings:
curl -XPUT "
http://localhost:9200/my_improved_index/my_improved_index_type/_mapping"
-d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}

}'

Available documents:
men’s shaver

men’s shavers
men’s foil shaver
men’s foils shaver

men’s foil shavers

men’s foils shavers

men's foil advanced shaver

norelco men's foil advanced shaver
Query:
curl -XPOST "
http://localhost:9200/my_improved_index/my_improved_index_type/_search"
-d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s shaver",
"slop": 5
}
}
}
]
}
}
}'

Returned result:

men's shaver --> correct

men's shavers --> correct

men's foils shaver --> NOT correct

norelco men's foil advanced shaver --> NOT correct

men's foil advanced shaver --> NOT correct

men's foil shaver --> NOT correct.

Expected result:

men's shaver --> exact phrase match

men's shavers --> ZERO word distance + 1 plural

men's foil shaver --> 1 word distance

men's foils shaver --> 1 word distance + 1 plural

men's foil advanced shaver --> 2 word distance

norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set "index"
to "not_anyzed". Alternatively, you could use the simple analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top

For phrase matches with distance, you can use the match_phrase type
of the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance for
a match to show up in your results. Documents with "closer" words should
get higher scores. You would boost this query less than the exact matches,
but more than the following.

For handling plurals, you'd probably need to do some stemming. Have a
look at the snowball token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)

For handling substrings, you can use ngrams, as you already seem to be
doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla krutib...@gmail.comwrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match

men’s foil shavers <------ exact word match on 2 of 3 words with 0
word distance + plural

men’s foils shaver <------ exact word match on 2 of 3 words with 1
word distance + plural

men’s foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men’s shaver <------ exact word match on 2 of 3 words (66% match)

men’s shavers <------ exact word match on 1 of 3 words + plural (66%
match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match

men’s foil shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foils shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match

men’s foils shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foil shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered yet.
Link to my other question "
Redirecting to Google Groups
".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

radu_gheorghe · May 2, 2014, 12:30pm

Hello,

The exact match vs plural is probably because of the stemmer. As you have
your fields and queries now, Elasticsearch has no way to boost individual
exact word matches higher. To fix this, you can add another field where you
just analyze the text using the standard analyzer (no stemming). Then add
that to another query within your bool and exact word matches should be
ranked higher. Though I would do a simple match for that (no phrase), to
account for the case where one word is exact and one is plural -> such a
document should be ranked higher than if both are plurals. You'll get that
with standard match because it looks for all terms, while match_phrase will
try to match the phrase with the given slop and none of those two documents
will get hit.

I don't know why the higher distance document is scored higher in your case

the 6th result should have been higher. Can you try with an index of one
shard and see if results are any different?

Either way, you should get an explanation for each document's score by
enabling Explain:

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, May 2, 2014 at 1:40 PM, Kruti Shukla krutibhatt17@gmail.com wrote:

Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:
Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not
knowing how helpful it can be but now I'm able play with the multi field
feature.
I tried following suggestion and created index and mapping accordingly.

I tried querying for first 2. First one was simple and second one with
slop. It is not returning correct slop(i,e, incremental distance).
Please help/suggest query improvements.

Please see my settings below:

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'

For mappings:
curl -XPUT "http://localhost:9200/my_improved_index/my_improved_
index_type/_mapping" -d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}

}'

Available documents:
men’s shaver

men’s shavers
men’s foil shaver
men’s foils shaver

men’s foil shavers

men’s foils shavers

men's foil advanced shaver

norelco men's foil advanced shaver
Query:
curl -XPOST "http://localhost:9200/my_improved_index/my_improved_
index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s shaver",
"slop": 5
}
}
}
]
}
}
}'

Returned result:

men's shaver --> correct

men's shavers --> correct

men's foils shaver --> NOT correct

norelco men's foil advanced shaver --> NOT correct

men's foil advanced shaver --> NOT correct

men's foil shaver --> NOT correct.

Expected result:

men's shaver --> exact phrase match

men's shavers --> ZERO word distance + 1 plural

men's foil shaver --> 1 word distance

men's foils shaver --> 1 word distance + 1 plural

men's foil advanced shaver --> 2 word distance

norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set "index"
to "not_anyzed". Alternatively, you could use the simple analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top

For phrase matches with distance, you can use the match_phrase type
of the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance
for a match to show up in your results. Documents with "closer" words
should get higher scores. You would boost this query less than the exact
matches, but more than the following.

For handling plurals, you'd probably need to do some stemming. Have a
look at the snowball token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)

For handling substrings, you can use ngrams, as you already seem to
be doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla krutib...@gmail.comwrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match

men’s foil shavers <------ exact word match on 2 of 3 words with 0
word distance + plural

men’s foils shaver <------ exact word match on 2 of 3 words with 1
word distance + plural

men’s foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men’s shaver <------ exact word match on 2 of 3 words (66% match)

men’s shavers <------ exact word match on 1 of 3 words + plural
(66% match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match

men’s foil shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foils shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match

men’s foils shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foil shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered yet.
Link to my other question "https://groups.google.com/
forum/?utm_medium=email&utm_source=footer#!msg/
elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_1DacX546MFVoXDk2897q2SFC1VMzKLxg%3DQ-tqmsmoXwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Kruti_Shukla · May 2, 2014, 1:10pm

Hi Radu,
Thank you so much for your reply and suggestion. It is really helping me
solving my query as well as knowledge on elasticsearch.

I now have index on only 1 shard. Results are some what improved.
Added one more field with "standard" analyzer.

PUT /my_improved_index/my_improved_index_type/_mapping
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"index_analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
},
"name_standard":{

```
              "type": "string",*
```

              "analyzer": "standard"*
         }
      }
   }
}

}

There are still problem with return result.
Query:

curl -XPOST
"http://localhost:9200/my_improved_index/my_improved_index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s foil shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s foil shaver",
"slop": 5
}
}
},
* {*

```
          "match": {*
```
```
             "name.name_standard": {*
```

                "query": "men\"s foil shaver"*

```
             }*
```
```
          }*
```
```
       }*
   ]
}
```
}
}'

Returned result:

men's foil shaver --> score: 4.4437184
men's foils shaver --> socre: 0.5215846
men's foil advanced shaver --> score: 0.49008065 * --> should be 4th*
norelco men's foil advanced shaver --> score: 0.42882058 * --> should be
5th*
5. men's shaver --> score: 0.04429976 --> should be 6th
6. men’s foil shavers --> score: 0.010844119 --> should be 3rd
men's shavers --> score: 0.010372223

Please suggest.. I tried having explain = true..but did not help much.

Below is the explanation for 6th return result "men's foil shavers":

{
"_shard": 0,
"_node": "VRNH3VrlTC2Tu6y_GgDZbw",
"_index": "my_improved_index",
"_type": "my_improved_index_type",
"_id": "35",
"_score": 0.010844119,
"_source": {
"name": "men’s foil shavers"
},
"_explanation": {
"value": 0.010844119,
"description": "product of:",
"details": [
{
"value": 0.032532357,
"description": "sum of:",
"details": [
{
"value": 0.032532357,
"description": "product of:",
"details": [
{
"value": 0.09759706,
"description": "sum of:",
"details": [
{
"value": 0.09759706,
"description":
"weight(name.name_standard:foil in 26) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.09759706,
"description":
"score(doc=26,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.07266014,
"description":
"queryWeight, product of:",
"details": [
{
"value": 2.686399,
"description":
"idf(docFreq=4, maxDocs=27)"
},
{
"value":
0.027047412,
"description":
"queryNorm"
}
]
},
{
"value": 1.3431995,
"description":
"fieldWeight in 26, product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,

"description": "termFreq=1.0"
}
]
},
{
"value": 2.686399,
"description":
"idf(docFreq=4, maxDocs=27)"
},
{
"value": 0.5,
"description":
"fieldNorm(doc=26)"
}
]
}
]
}
]
}
]
},
{
"value": 0.33333334,
"description": "coord(1/3)"
}
]
}
]
},
{
"value": 0.33333334,
"description": "coord(1/3)"
}
]
}
}

On Friday, May 2, 2014 8:30:03 AM UTC-4, Radu Gheorghe wrote:

Hello,

The exact match vs plural is probably because of the stemmer. As you have
your fields and queries now, Elasticsearch has no way to boost individual
exact word matches higher. To fix this, you can add another field where you
just analyze the text using the standard analyzer (no stemming). Then add
that to another query within your bool and exact word matches should be
ranked higher. Though I would do a simple match for that (no phrase), to
account for the case where one word is exact and one is plural -> such a
document should be ranked higher than if both are plurals. You'll get that
with standard match because it looks for all terms, while match_phrase will
try to match the phrase with the given slop and none of those two documents
will get hit.

I don't know why the higher distance document is scored higher in your
case - the 6th result should have been higher. Can you try with an index of
one shard and see if results are any different?

Either way, you should get an explanation for each document's score by
enabling Explain:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, May 2, 2014 at 1:40 PM, Kruti Shukla <krutib...@gmail.com<javascript:>

wrote:
Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:
Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not
knowing how helpful it can be but now I'm able play with the multi field
feature.
I tried following suggestion and created index and mapping accordingly.

I tried querying for first 2. First one was simple and second one with
slop. It is not returning correct slop(i,e, incremental distance).
Please help/suggest query improvements.

Please see my settings below:

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'

For mappings:
curl -XPUT "http://localhost:9200/my_improved_index/my_improved_
index_type/_mapping" -d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}

}'

Available documents:
men’s shaver

men’s shavers
men’s foil shaver
men’s foils shaver

men’s foil shavers

men’s foils shavers

men's foil advanced shaver

norelco men's foil advanced shaver
Query:
curl -XPOST "http://localhost:9200/my_improved_index/my_improved_
index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s shaver",
"slop": 5
}
}
}
]
}
}
}'

Returned result:

men's shaver --> correct

men's shavers --> correct

men's foils shaver --> NOT correct

norelco men's foil advanced shaver --> NOT correct

men's foil advanced shaver --> NOT correct

men's foil shaver --> NOT correct.

Expected result:

men's shaver --> exact phrase match

men's shavers --> ZERO word distance + 1 plural

men's foil shaver --> 1 word distance

men's foils shaver --> 1 word distance + 1 plural

men's foil advanced shaver --> 2 word distance

norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set
"index" to "not_anyzed". Alternatively, you could use the simple
analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top

For phrase matches with distance, you can use the match_phrase type
of the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance
for a match to show up in your results. Documents with "closer" words
should get higher scores. You would boost this query less than the exact
matches, but more than the following.

For handling plurals, you'd probably need to do some stemming. Have
a look at the snowball token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)

For handling substrings, you can use ngrams, as you already seem to
be doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla krutib...@gmail.comwrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men’s shaver
ii. men’s shavers
iii. men’s foil shaver
iv. men’s foils shaver
v. men’s foil shavers
vi. men’s foils shavers

*Case 1: *search for : “men’s foil shaver”
Expected result:

men’s foil shaver <------ exact phrase match

men’s foil shavers <------ exact word match on 2 of 3 words with 0
word distance + plural

men’s foils shaver <------ exact word match on 2 of 3 words with 1
word distance + plural

men’s foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men’s shaver <------ exact word match on 2 of 3 words (66% match)

men’s shavers <------ exact word match on 1 of 3 words + plural
(66% match)

*Case 2: *search for : “men’s foil shavers”
Expected result:

men’s foil shavers <------ exact phrase match

men’s foil shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foils shavers <------ exact word match on 2 of 3 words with
1 word distance + singular

men’s foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Case 3: search for : “men’s foils shavers”
Expected result:

men’s foils shavers <------ exact phrase match

men’s foils shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men’s foil shavers <------ exact word match on 2 of 3 words with 1
word distance + singular

men’s foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men’s shavers <------ exact word match on 2 of 3 words (66% match)

men’s shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered
yet.
Link to my other question "https://groups.google.com/
forum/?utm_medium=email&utm_source=footer#!msg/
elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e99be4c6-d7d0-479b-8cf8-4986d01acf53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kruti_Shukla · May 5, 2014, 11:48am

I tried changing tokenizer from "standard" to "whitespace". In the mapping
I separated "index_analyzer" to use my customer analyzer and
search_analyzer to use default standard analyzer. Still the results are not
improved.
Explanation is also not that helpful.

On Fri, May 2, 2014 at 9:10 AM, Kruti Shukla krutibhatt17@gmail.com wrote:

Hi Radu,
Thank you so much for your reply and suggestion. It is really helping me
solving my query as well as knowledge on elasticsearch.

I now have index on only 1 shard. Results are some what improved.
Added one more field with "standard" analyzer.

PUT /my_improved_index/my_improved_index_type/_mapping
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"index_analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
},
"name_standard":{
              "type": "string",*
              "analyzer": "standard"*
         }
      }
   }
}
}
}

There are still problem with return result.
Query:

curl -XPOST "
http://localhost:9200/my_improved_index/my_improved_index_type/_search"
-d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s foil shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s foil shaver",
"slop": 5
}
}
},
* {*
          "match": {*
             "name.name_standard": {*
                "query": "men\"s foil shaver"*
             }*
          }*
       }*
   ]
}
}
}'
Returned result:

men's foil shaver --> score: 4.4437184

men's foils shaver --> socre: 0.5215846

men's foil advanced shaver --> score: 0.49008065 * --> should be 4th*

norelco men's foil advanced shaver --> score: 0.42882058 * --> should
be 5th*
5. men's shaver --> score: 0.04429976 --> should be 6th
6. men's foil shavers --> score: 0.010844119 --> should be 3rd

men's shavers --> score: 0.010372223

Please suggest.. I tried having explain = true..but did not help much.

Below is the explanation for 6th return result "men's foil shavers":

{
"_shard": 0,
"_node": "VRNH3VrlTC2Tu6y_GgDZbw",
"_index": "my_improved_index",
"_type": "my_improved_index_type",
"_id": "35",
"_score": 0.010844119,
"_source": {
"name": "men's foil shavers"
},
"_explanation": {
"value": 0.010844119,
"description": "product of:",
"details": [
{
"value": 0.032532357,
"description": "sum of:",
"details": [
{
"value": 0.032532357,
"description": "product of:",
"details": [
{
"value": 0.09759706,
"description": "sum of:",
"details": [
{
"value": 0.09759706,
"description":
"weight(name.name_standard:foil in 26) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.09759706,
"description":
"score(doc=26,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.07266014,
"description":
"queryWeight, product of:",
"details": [
{
"value": 2.686399,
"description":
"idf(docFreq=4, maxDocs=27)"
},
{
"value":
0.027047412,
"description":
"queryNorm"
}
]
},
{
"value": 1.3431995,
"description":
"fieldWeight in 26, product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,

"description": "termFreq=1.0"
}
]
},
{
"value": 2.686399,
"description":
"idf(docFreq=4, maxDocs=27)"
},
{
"value": 0.5,
"description":
"fieldNorm(doc=26)"
}
]
}
]
}
]
}
]
},
{
"value": 0.33333334,
"description": "coord(1/3)"
}
]
}
]
},
{
"value": 0.33333334,
"description": "coord(1/3)"
}
]
}
}

On Friday, May 2, 2014 8:30:03 AM UTC-4, Radu Gheorghe wrote:
Hello,

The exact match vs plural is probably because of the stemmer. As you have
your fields and queries now, Elasticsearch has no way to boost individual
exact word matches higher. To fix this, you can add another field where you
just analyze the text using the standard analyzer (no stemming). Then add
that to another query within your bool and exact word matches should be
ranked higher. Though I would do a simple match for that (no phrase), to
account for the case where one word is exact and one is plural -> such a
document should be ranked higher than if both are plurals. You'll get that
with standard match because it looks for all terms, while match_phrase will
try to match the phrase with the given slop and none of those two documents
will get hit.

I don't know why the higher distance document is scored higher in your
case - the 6th result should have been higher. Can you try with an index of
one shard and see if results are any different?

Either way, you should get an explanation for each document's score by
enabling Explain:
Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/search-request-explain.html

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Fri, May 2, 2014 at 1:40 PM, Kruti Shukla krutib...@gmail.com wrote:
Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:
Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not
knowing how helpful it can be but now I'm able play with the multi field
feature.
I tried following suggestion and created index and mapping accordingly.

I tried querying for first 2. First one was simple and second one with
slop. It is not returning correct slop(i,e, incremental distance).
Please help/suggest query improvements.

Please see my settings below:

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'

For mappings:
curl -XPUT "http://localhost:9200/my_improved_index/my_improved_index_
type/_mapping" -d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}

}'

Available documents:
men's shaver

men's shavers
men's foil shaver
men's foils shaver

men's foil shavers

men's foils shavers

men's foil advanced shaver

norelco men's foil advanced shaver
Query:
curl -XPOST "http://localhost:9200/my_improved_index/my_improved_index_
type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men"s shaver",
"slop": 5
}
}
}
]
}
}
}'

Returned result:

men's shaver --> correct

men's shavers --> correct

men's foils shaver --> NOT correct

norelco men's foil advanced shaver --> NOT correct

men's foil advanced shaver --> NOT correct

men's foil shaver --> NOT correct.

Expected result:

men's shaver --> exact phrase match

men's shavers --> ZERO word distance + 1 plural

men's foil shaver --> 1 word distance

men's foils shaver --> 1 word distance + 1 plural

men's foil advanced shaver --> 2 word distance

norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?

On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:

Hi Kruti,

The short answer is yes, it is possible. Here's one way to do it:

Have the fields you search on as multi fieldhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html,
where you index them with various settings, like once not-analyzed for
exact matches, once with ngrams to account for typoes and so on. You can
query all those sub-fields, and use the multi-match query with best
fieldshttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fieldsor the DisMax
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.htmlto wrap all those queries and take the best score (or the best score and a
factor of the other scores by using the tie breaker).

Now, for the specific requirements you have:

For exact matching, you can skip analysis altogether, and set
"index" to "not_anyzed". Alternatively, you could use the simple
analyzerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer or
something equally "harmless" to allow for some error. You could boost this
kind of query a lot, so that exact matches come out on top

For phrase matches with distance, you can use the match_phrase
type of the match queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase.
You can configure a slop that defines the maximum allowed distance
for a match to show up in your results. Documents with "closer" words
should get higher scores. You would boost this query less than the exact
matches, but more than the following.

For handling plurals, you'd probably need to do some stemming. Have
a look at the snowball token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.htmlor the stemmer
token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter.
Again, this would be boosted lower than 1) and 2), but more than 4)

For handling substrings, you can use ngrams, as you already seem to
be doing. Alternatively, you can pay the price at query time by using the
"fuziness" option of the match query.

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla krutib...@gmail.comwrote:

My final goal is to have following search precedence:

Exact phrase match

Exact word match with incremental distance

Plurals

Substring

Suppose I have following documents:
i. men's shaver
ii. men's shavers
iii. men's foil shaver
iv. men's foils shaver
v. men's foil shavers
vi. men's foils shavers

*Case 1: *search for : "men's foil shaver"
Expected result:

men's foil shaver <------ exact phrase match

men's foil shavers <------ exact word match on 2 of 3 words with
0 word distance + plural

men's foils shaver <------ exact word match on 2 of 3 words with
1 word distance + plural

men's foils shavers <------ exact word match on 1 of 3 words + 2
plurals

men's shaver <------ exact word match on 2 of 3 words (66% match)

men's shavers <------ exact word match on 1 of 3 words + plural
(66% match)

*Case 2: *search for : "men's foil shavers"
Expected result:

men's foil shavers <------ exact phrase match

men's foil shaver <------ exact word match on 2 of 3 words with 0
word distance + singular

men's foils shavers <------ exact word match on 2 of 3 words with
1 word distance + singular

men's foils shaver <------ exact word match on 1 of 3 words + 2
singulars

men's shavers <------ exact word match on 2 of 3 words (66% match)

men's shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Case 3: search for : "men's foils shavers"
Expected result:

men's foils shavers <------ exact phrase match

men's foils shaver <------ exact word match on 2 of 3 words with
0 word distance + singular

men's foil shavers <------ exact word match on 2 of 3 words with
1 word distance + singular

men's foil shaver <------ exact word match on 1 of 3 words + 2
singulars

men's shavers <------ exact word match on 2 of 3 words (66% match)

men's shaver <------ exact word match on 1 of 3 words + singular
(66% match)

Is there any way in elasticsearch I can achieve this?
This question is related to my other question which is not answered
yet.
Link to my other question "https://groups.google.com/for
um/?utm_medium=email&utm_source=footer#!msg/elasticsearch/
ui9OR7JARs4/Mp3oOtTqY0EJ".

Any suggestion would help!
Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40goo
glegroups.com https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/ET-S3SCD22I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e99be4c6-d7d0-479b-8cf8-4986d01acf53%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e99be4c6-d7d0-479b-8cf8-4986d01acf53%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Cheers!!
Kruti

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACofF64zU_TeqvRiRCvLgA4U1HWPDm%2BgDmXeQB-UUxtWtRNEkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Partial phrase or exact phrase matching Elasticsearch	10	7302	August 20, 2020
Exact match search with permutation of words Elasticsearch	19	5733	March 20, 2019
Increase score for single word matches (plural / singular) versus multiple words matches? Elasticsearch	1	404	March 25, 2022
Search terms matching order of precedence? Elasticsearch	4	1017	July 6, 2017
Substring match in search term order using Elasticsearch Elasticsearch	3	3080	July 6, 2017

Partial word match with singular and plurals: Elasticsearch

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Best regards, Radu

Related topics

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu