Number of matched positions as score

Dany_Gielow · July 1, 2014, 11:48am

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to the
number of matched positions.

Every position that has multiple tokens (stacked tokens) should count only
as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · July 1, 2014, 12:11pm

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count only
as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mSezCmfXK6TJDa-1GTX%3DWzercrmDb6-vsAXum%2B4C4Cvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dany_Gielow · July 1, 2014, 12:29pm

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that have
a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow <dany....@gmail.com
<javascript:>> wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · July 1, 2014, 1:27pm

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS |
_CACHE);positions = ;score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation -

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nd4pPM3nC97jbChkGbW_RbAToxKQtkaJVmT7eB8P9gsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dany_Gielow · July 1, 2014, 1:51pm

Hello Vineeth,

Your script works like a charm. Thank you very much.

I will probably write a native script function, which will support multiple
query terms.

Thank you again
Dany

On Tuesday, July 1, 2014 3:27:36 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS |
_CACHE);positions = ;score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation -
Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow <dany....@gmail.com
<javascript:>> wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · July 1, 2014, 2:06pm

Hello ,

Exactly , go for a script which will check the position of each term and
add that to the score if that position is not already taken.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 7:21 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hello Vineeth,

Your script works like a charm. Thank you very much.

I will probably write a native script function, which will support
multiple query terms.

Thank you again
Dany

On Tuesday, July 1, 2014 3:27:36 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS
| _CACHE);positions = ;score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation - Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/modules-advanced-scripting.html#term
positions_offsets_and_payloads
Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/query-dsl-function-score-query.html

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow dany....@gmail.com wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it
to the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DNZyre14jERukkJUTJJo_uKJ1r57EtfxPdy5BoDN%2BG5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.