Optimizing prefix search

Rauan_Maemirov · December 24, 2014, 9:39am

Hey all. I've been following this guide and it's
wonderful! http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/proximity-relevance.html
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1d036247-e6d1-462e-86c8-50182956196d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

BillyEm · December 25, 2014, 4:05am

You're trying, I think, to do phrase search with edit-distance embedded.
Certainly a doable task. btw and fwiw: relevance is hardly ever "broken".

On Wednesday, December 24, 2014 4:39:46 AM UTC-5, Rauan Maemirov wrote:

Hey all. I've been following this guide and it's wonderful!
Elasticsearch Platform — Find real-time answers at scale | Elastic
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/042b5f25-514f-41b5-b2ef-cbf2778bc712%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rauan_Maemirov · December 25, 2014, 8:42am

Hey, Billy. Did you
mean Elasticsearch Platform — Find real-time answers at scale | Elastic
?
Could you point me in the direction which parameters to look for?

On Thursday, December 25, 2014 10:05:17 AM UTC+6, BillyEm wrote:

You're trying, I think, to do phrase search with edit-distance embedded.
Certainly a doable task. btw and fwiw: relevance is hardly ever "broken".

On Wednesday, December 24, 2014 4:39:46 AM UTC-5, Rauan Maemirov wrote:

Hey all. I've been following this guide and it's wonderful!
Elasticsearch Platform — Find real-time answers at scale | Elastic
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/707acbd7-06bc-446c-b0f6-31ee0806ef63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES gives very different scores, in match_phrase_prefix, for similar documents even I use DfsQueryThenFetch Elasticsearch	1	432	July 6, 2017
ElasticSeach Prefix Query For boost Elasticsearch	1	500	February 6, 2020
[Theory] Improving search result relevance? Elasticsearch	8	1387	July 6, 2017
Query by keywords and phrase. Performance question Elasticsearch	4	936	July 6, 2017
Partial word match with singular and plurals: Elasticsearch Elasticsearch	7	7820	July 6, 2017

Optimizing prefix search

Related topics