Score based on term frequency only


(kevins) #1

I would like to score based entirely on term count.

For example, given the following two documents:

  1. { "apple" }

  2. { "apple apple" }

Searching "apple" ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

You could provide your own Similarity class as a plugin. Don't have any
sample code in front of me, but it would be based of TFIDFSimilarity and
you would basically needed to ignore the norms and other values.

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

The IDF portion could probably remain since it ranks the different terms in
your query, not the score of each term.

Cheers,

Ivan

On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinsteger@gmail.com wrote:

I would like to score based entirely on term count.

For example, given the following two documents:

  1. { "apple" }

  2. { "apple apple" }

Searching "apple" ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Britta Weber) #3

You could also use a script as described here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Cheers,
Britta

On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic ivan@brusic.com wrote:

You could provide your own Similarity class as a plugin. Don't have any
sample code in front of me, but it would be based of TFIDFSimilarity and
you would basically needed to ignore the norms and other values.

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

The IDF portion could probably remain since it ranks the different terms in
your query, not the score of each term.

Cheers,

Ivan

On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinsteger@gmail.com wrote:

I would like to score based entirely on term count.

For example, given the following two documents:

  1. { "apple" }

  2. { "apple apple" }

Searching "apple" ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #4

Great feature. However, it looks like it is only available in the master
branch: https://github.com/elasticsearch/elasticsearch/issues/3772

--
Ivan

On Tue, Jan 7, 2014 at 8:31 AM, Britta Weber <britta.weber@elasticsearch.com

wrote:

You could also use a script as described here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Cheers,
Britta

On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic ivan@brusic.com wrote:

You could provide your own Similarity class as a plugin. Don't have any
sample code in front of me, but it would be based of TFIDFSimilarity and
you would basically needed to ignore the norms and other values.

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

The IDF portion could probably remain since it ranks the different terms
in
your query, not the score of each term.

Cheers,

Ivan

On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinsteger@gmail.com wrote:

I would like to score based entirely on term count.

For example, given the following two documents:

  1. { "apple" }

  2. { "apple apple" }

Searching "apple" ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

--
You received this message because you are subscribed to the Google
Groups

"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an

email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDAzNoZwdcquTqyB70Kpw4DSPSPZr2fe%3DCUbMORv1pbUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Zweigenhaft) #5

I am new in Elasticsearch and I would like to score based entirely on term
count. I would like to know how you solved it.

Can you provide me your solution ?

Actually, I would like to count how many times a phrase repeats in a
document
(for example the phrase- "apple apple"). Do you think it is
possible to use the term frequency for phrases counting ?.

I'm really stuck with this and need help.

Thanks you.

On Sunday, January 5, 2014 11:57:25 PM UTC+2, Kevin S wrote:

I would like to score based entirely on term count.

For example, given the following two documents:

  1. { "apple" }

  2. { "apple apple" }

Searching "apple" ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4d40f0e-c25e-4c22-9c48-af23eb8794f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6