Optimizing prefix search

Hey all. I've been following this guide and it's
wonderful! http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/proximity-relevance.html
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1d036247-e6d1-462e-86c8-50182956196d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You're trying, I think, to do phrase search with edit-distance embedded.
Certainly a doable task. btw and fwiw: relevance is hardly ever "broken".
:wink:

On Wednesday, December 24, 2014 4:39:46 AM UTC-5, Rauan Maemirov wrote:

Hey all. I've been following this guide and it's wonderful!
Elasticsearch Platform — Find real-time answers at scale | Elastic
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/042b5f25-514f-41b5-b2ef-cbf2778bc712%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hey, Billy. Did you
mean Elasticsearch Platform — Find real-time answers at scale | Elastic
?
Could you point me in the direction which parameters to look for?

On Thursday, December 25, 2014 10:05:17 AM UTC+6, BillyEm wrote:

You're trying, I think, to do phrase search with edit-distance embedded.
Certainly a doable task. btw and fwiw: relevance is hardly ever "broken".
:wink:

On Wednesday, December 24, 2014 4:39:46 AM UTC-5, Rauan Maemirov wrote:

Hey all. I've been following this guide and it's wonderful!
Elasticsearch Platform — Find real-time answers at scale | Elastic
But it doesn't work well for large indexes.

Here's what i'm trying to do:
1). Find matches that will be both relevant and similar to the original
form.

{
"query": {
"filtered": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "first th",
"minimum_should_match": "40%",
"type": "most_fields",
"slop": 10,
"fields": ["title", "title_original"]
}
},
"should": {
"multi_match": {
"query": "first th",
"type": "phrase_prefix",
"slop": 50,
"fields": ["title", "title_original"],
"max_expansions": 50
}
}
}
}
}
}
}

2). I'm setting minimal score 1 to throw out long-tail.

When I search for let's say 'first thir' it works like a charm. But if I
try searching for 'first th', there can be items with exact 'th' word in
it, so relevance is broken and every result has very low score. What I need
to do is to throw out items like 'th', 'third' etc, and boost score for
items that contains both 'first' and words starting with 'th'. Matching
exactly by whole title 'first th' wouldn't help because I might need items
like 'first second third'. I tried playing with slop, but it doesn't solve
the problem globally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/707acbd7-06bc-446c-b0f6-31ee0806ef63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.