Text_phrase_prefix scoring and closest match


(Jamie Brough) #1

hi,

I'm implementing an autocomplete using text_phrase_prefix, but 'exact'
matches score lower than substring matches, so the most relevant results
don't appear first and in some cases are not in the first page fo results
(I'd like to avoid returning too many results and sorting at the client).
For example:

index:

{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"place": {
"dynamic": false,
"type": "object",
"properties": {
...snip...
"name": {
"type": "string",
"analyzer": "standard",
"store": "yes",
"term_vector": "with_positions_offsets"
}
...snip...
}

query:

{
"query": {
"text_phrase_prefix": {
"name": "London"
}
}
}

As you can see below, "Londonthorpe" has a score of 3.9, whereas "London"
is 1.05 (I'm wondering why the score is 1.05 and not 1, since it is a
perfect match?).

Is there a way to order results by closest match, so that the shortest
complete match is returned first - if not with text_phrase_prefix, then
perhaps a custom forward edgengram filter?

thanks for any pointers,

here are the results of the above query:

{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 34,
"max_score": 3.9374971,
"hits": [
{
"_index": "places",
"_type": "place",
"_id": "82Cxx5olSR29vIa0tnQB6w",
"_score": 3.9374971,
"_source": {
"name": "Londonthorpe",
"admin2": "Lincolnshire",
"country": "GBR",
"location": "v1mth1htztws",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "0lY8WlnxT1yvCysgCHL0NA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "North Yorkshire",
"country": "GBR",
"location": "v1wsdygy0zz7",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JS2laACxQimS7A740qNMcA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "Sandwell",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "20IzrjRWSYSYaHMD24TxWg",
"_score": 1.0555032,
"_source": {
"name": "London",
"admin2": "City of London",
"country": "GBR",
"location": "v1hth63636dy",
"rank": 1
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "WGd0GQSGS_e6Y14QoU0zCw",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Bradford",
"country": "GBR",
"location": "v1w637h6gzw6",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JVEMN2DdRZqHEVCyJynoWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Powys",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "j5uNub82RRqqr03EefpmtA",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Shropshire",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "6oXW6HiyQjakxShTcKhpWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Worcestershire",
"country": "GBR",
"location": "v1m1wygzz03t",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "7UOGKwu5RXGyGaLcVkAIAQ",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Gloucestershire",
"country": "GBR",
"location": "v1hygzh00sd1",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "tnVYn2j2Q-6B6LBWi57g0w",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Oxfordshire",
"country": "GBR",
"location": "v1hy07wy36d0",
"rank": 3
},
"_grouped": false
}
]
}
}


(Jamie Brough) #2

wow, having read the "Search and Ngram tokenizerhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/xK7UhGVF0E8"
thread, this mapping and query was what I was looking for:

http://elasticsearch-users.115913.n3.nabble.com/Question-about-multi-field-and-edge-ngram-td3800000.html

works perfectly.

On Thursday, 29 March 2012 16:07:49 UTC+1, Jamie Brough wrote:

hi,

I'm implementing an autocomplete using text_phrase_prefix, but 'exact'
matches score lower than substring matches, so the most relevant results
don't appear first and in some cases are not in the first page fo results
(I'd like to avoid returning too many results and sorting at the client).
For example:

index:

{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"place": {
"dynamic": false,
"type": "object",
"properties": {
...snip...
"name": {
"type": "string",
"analyzer": "standard",
"store": "yes",
"term_vector": "with_positions_offsets"
}
...snip...
}

query:

{
"query": {
"text_phrase_prefix": {
"name": "London"
}
}
}

As you can see below, "Londonthorpe" has a score of 3.9, whereas "London"
is 1.05 (I'm wondering why the score is 1.05 and not 1, since it is a
perfect match?).

Is there a way to order results by closest match, so that the shortest
complete match is returned first - if not with text_phrase_prefix, then
perhaps a custom forward edgengram filter?

thanks for any pointers,

here are the results of the above query:

{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 34,
"max_score": 3.9374971,
"hits": [
{
"_index": "places",
"_type": "place",
"_id": "82Cxx5olSR29vIa0tnQB6w",
"_score": 3.9374971,
"_source": {
"name": "Londonthorpe",
"admin2": "Lincolnshire",
"country": "GBR",
"location": "v1mth1htztws",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "0lY8WlnxT1yvCysgCHL0NA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "North Yorkshire",
"country": "GBR",
"location": "v1wsdygy0zz7",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JS2laACxQimS7A740qNMcA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "Sandwell",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "20IzrjRWSYSYaHMD24TxWg",
"_score": 1.0555032,
"_source": {
"name": "London",
"admin2": "City of London",
"country": "GBR",
"location": "v1hth63636dy",
"rank": 1
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "WGd0GQSGS_e6Y14QoU0zCw",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Bradford",
"country": "GBR",
"location": "v1w637h6gzw6",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JVEMN2DdRZqHEVCyJynoWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Powys",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "j5uNub82RRqqr03EefpmtA",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Shropshire",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "6oXW6HiyQjakxShTcKhpWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Worcestershire",
"country": "GBR",
"location": "v1m1wygzz03t",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "7UOGKwu5RXGyGaLcVkAIAQ",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Gloucestershire",
"country": "GBR",
"location": "v1hygzh00sd1",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "tnVYn2j2Q-6B6LBWi57g0w",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Oxfordshire",
"country": "GBR",
"location": "v1hy07wy36d0",
"rank": 3
},
"_grouped": false
}
]
}
}


(alheim) #3

:slight_smile: Seems we are all facing the same issues.

Maybe a tutorial could be posted on the website ? I can do it.

Suggestion is key for real time web application.

On Thu, Mar 29, 2012 at 5:35 PM, Jamie Brough jamieb@yourgolftravel.comwrote:

wow, having read the "Search and Ngram tokenizerhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/xK7UhGVF0E8"
thread, this mapping and query was what I was looking for:

http://elasticsearch-users.115913.n3.nabble.com/Question-about-multi-field-and-edge-ngram-td3800000.html

works perfectly.

On Thursday, 29 March 2012 16:07:49 UTC+1, Jamie Brough wrote:

hi,

I'm implementing an autocomplete using text_phrase_prefix, but 'exact'
matches score lower than substring matches, so the most relevant results
don't appear first and in some cases are not in the first page fo results
(I'd like to avoid returning too many results and sorting at the client).
For example:

index:

{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"place": {
"dynamic": false,
"type": "object",
"properties": {
...snip...
"name": {
"type": "string",
"analyzer": "standard",
"store": "yes",
"term_vector": "with_positions_offsets"
}
...snip...
}

query:

{
"query": {
"text_phrase_prefix": {
"name": "London"
}
}
}

As you can see below, "Londonthorpe" has a score of 3.9, whereas "London"
is 1.05 (I'm wondering why the score is 1.05 and not 1, since it is a
perfect match?).

Is there a way to order results by closest match, so that the shortest
complete match is returned first - if not with text_phrase_prefix, then
perhaps a custom forward edgengram filter?

thanks for any pointers,

here are the results of the above query:

{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 34,
"max_score": 3.9374971,
"hits": [
{
"_index": "places",
"_type": "place",
"_id": "82Cxx5olSR29vIa0tnQB6w",
"_score": 3.9374971,
"_source": {
"name": "Londonthorpe",
"admin2": "Lincolnshire",
"country": "GBR",
"location": "v1mth1htztws",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "0lY8WlnxT1yvCysgCHL0NA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "North Yorkshire",
"country": "GBR",
"location": "v1wsdygy0zz7",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JS2laACxQimS7A740qNMcA",
"_score": 3.4017,
"_source": {
"name": "Londonderry",
"admin2": "Sandwell",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "20IzrjRWSYSYaHMD24TxWg",
"_score": 1.0555032,
"_source": {
"name": "London",
"admin2": "City of London",
"country": "GBR",
"location": "v1hth63636dy",
"rank": 1
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "WGd0GQSGS_e6Y14QoU0zCw",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Bradford",
"country": "GBR",
"location": "v1w637h6gzw6",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "JVEMN2DdRZqHEVCyJynoWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Powys",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "j5uNub82RRqqr03EefpmtA",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Shropshire",
"country": "GBR",
"location": "v1m6d631w63y",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "6oXW6HiyQjakxShTcKhpWg",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Worcestershire",
"country": "GBR",
"location": "v1m1wygzz03t",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "7UOGKwu5RXGyGaLcVkAIAQ",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Gloucestershire",
"country": "GBR",
"location": "v1hygzh00sd1",
"rank": 3
},
"_grouped": false
},
{
"_index": "places",
"_type": "place",
"_id": "tnVYn2j2Q-6B6LBWi57g0w",
"_score": 0.65968955,
"_source": {
"name": "Little London",
"admin2": "Oxfordshire",
"country": "GBR",
"location": "v1hy07wy36d0",
"rank": 3
},
"_grouped": false
}
]
}
}

--
Alexandre Heimburger
VP Engineering
blueKiwi Software
tel : +33687880997
email : ahb@bluekiwi-software.com
adress : 93 rue Vieille du Temple, 75003 Paris

blueKiwi is the innovation leader in Enterprise Social Software. Our
solutions enable enterprises worldwide to engage and interact with their
internal and external social networks in multiple business domains.
blueKiwi has been consistently recognized by Gartner Inc. as a visionary
provider since 2007.


(system) #4