Number of matched positions as score


(Dany Gielow) #1

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to the
number of matched positions.

Every position that has multiple tokens (stacked tokens) should count only
as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count only
as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mSezCmfXK6TJDa-1GTX%3DWzercrmDb6-vsAXum%2B4C4Cvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dany Gielow) #3

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that have
a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow <dany....@gmail.com
<javascript:>> wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #4

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS |
_CACHE);positions = [];score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_positions_offsets_and_payloads
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5nd4pPM3nC97jbChkGbW_RbAToxKQtkaJVmT7eB8P9gsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dany Gielow) #5

Hello Vineeth,

Your script works like a charm. Thank you very much.

I will probably write a native script function, which will support multiple
query terms.

Thank you again
Dany

On Tuesday, July 1, 2014 3:27:36 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS |
_CACHE);positions = [];score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_positions_offsets_and_payloads

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow <dany....@gmail.com
<javascript:>> wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it to
the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #6

Hello ,

Exactly , go for a script which will check the position of each term and
add that to the score if that position is not already taken.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 7:21 PM, Dany Gielow dany.gielow@gmail.com wrote:

Hello Vineeth,

Your script works like a charm. Thank you very much.

I will probably write a native script function, which will support
multiple query terms.

Thank you again
Dany

On Tuesday, July 1, 2014 3:27:36 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

This is ( surprisingly ) possible.

{
"size": 30,
"from": 0,
"explain": true,
"query": {
"function_score": {
"query": {
"query_string": {
"query": "red",
"fields": [
"message"
]
}
},
"functions": [
{
"script_score": {
"script": "termInfo = _index['message'].get('red',_POSITIONS
| _CACHE);positions = [];score = 0;for (pos : termInfo) {
if(positions.contains(pos.position)){ ;next;} score = score +
pos.position;positions.add(pos.position)}return positions.size();"
}
}
],
"boost_mode": "replace"
}
}
}

In the scripting in function_score , you can access the position and the
use this programatically to find the score.
I have done an example usage above and it works in the latest version.

More documentation - http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/modules-advanced-scripting.html#term
positions_offsets_and_payloads
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-function-score-query.html

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:59 PM, Dany Gielow dany....@gmail.com wrote:

Hi Vineeth,

The position is determined by the PositionIncrementAttribute in a Lucene
TokenStream.
So when I say multiple tokens at the same position, I mean tokens that
have a positionIncrement of 0.

These tokens are generated by a SynonymFilter for example, which expands
all synonyms. All these synonyms would have the same position.
In my example "car" and "automobile" are synonyms which are expanded.

On Tuesday, July 1, 2014 2:11:17 PM UTC+2, vineeth mohan wrote:

Hello Dany ,

I didnt really understand what you mean by position.
Kindly clarify.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 5:18 PM, Dany Gielow dany....@gmail.com wrote:

Hello,

I want to match only documents which match all positions.
My approach would be to index the number of positions and compare it
to the number of matched positions.

Every position that has multiple tokens (stacked tokens) should count
only as 1.

Given the following positions in a field:
Position 1: red
Position 2: car, automobile

These queries should be scored as follows:

"red": 1
"car": 1
"automobile": 1
"red car": 2
"red automobile": 2
"car automobile": 1
"fast red car": 2

What approach should I use to get the number of matched positions as a
score?
I guess I need a custom similarity for that.

Thanks in advance
Dany

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d90446e-b1d8-4c8d-8566-d10db7e82369%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df6dcbf4-b542-445d-9b33-05a1a4bd1354%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/dd510e2e-f982-4c4a-aa91-3c139e20927e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DNZyre14jERukkJUTJJo_uKJ1r57EtfxPdy5BoDN%2BG5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7