Ignoring Density/Frequency

Rob_Styles · May 7, 2013, 2:05pm

Hi All,

I'm trying to change search ranking and can't find info on how to do it. I
suspect I'm using the wrong terminology.

I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.

This is expected as word density/frequency is taken into account in the
ranking right?

Is there a way to configure the ranking to not take into account the length
of the doc in the scoring? Such that all docs containing Tom Jones would
rank higher than Aled Jones regardless of the length of the doc?

What's the correct terminology to be searching for these kinds of settings?

thanks

rob

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Randall_McRee · May 7, 2013, 5:24pm

You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.

You can google for much more information.

On Tue, May 7, 2013 at 7:05 AM, Rob Styles rob@dynamicorange.com wrote:

Hi All,

I'm trying to change search ranking and can't find info on how to do it. I
suspect I'm using the wrong terminology.

I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.

This is expected as word density/frequency is taken into account in the
ranking right?

Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?

What's the correct terminology to be searching for these kinds of settings?

thanks

rob

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Randall_McRee · May 7, 2013, 5:31pm

Looked at our implementation again--there are three things relevant to your
case.

You may need DFS_QUERY_THEN_FETCH to get accurate tf-idf calculations
across shards.
Use omit_norms as mentioned earlier
Use omit_tf. We use both for several fields.

Those are the three "levers" to investigate.

Hope this helps.

On Tue, May 7, 2013 at 10:24 AM, Randall McRee randall.mcree@gmail.comwrote:

You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.

Elasticsearch Platform — Find real-time answers at scale | Elastic

You can google for much more information.

On Tue, May 7, 2013 at 7:05 AM, Rob Styles rob@dynamicorange.com wrote:

Hi All,

I'm trying to change search ranking and can't find info on how to do it.
I suspect I'm using the wrong terminology.

I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.

This is expected as word density/frequency is taken into account in the
ranking right?

Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?

What's the correct terminology to be searching for these kinds of
settings?

thanks

rob

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rob_Styles · May 8, 2013, 8:48am

Fantastic - thanks for the pointers

rob

On Tuesday, May 7, 2013 6:31:24 PM UTC+1, RKM wrote:

Looked at our implementation again--there are three things relevant to
your case.

You may need DFS_QUERY_THEN_FETCH to get accurate tf-idf calculations
across shards.

Use omit_norms as mentioned earlier

Use omit_tf. We use both for several fields.

Those are the three "levers" to investigate.

Hope this helps.

On Tue, May 7, 2013 at 10:24 AM, Randall McRee <randal...@gmail.com<javascript:>

wrote:

You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.

Elasticsearch Platform — Find real-time answers at scale | Elastic

You can google for much more information.

On Tue, May 7, 2013 at 7:05 AM, Rob Styles <r...@dynamicorange.com<javascript:>

wrote:

Hi All,

I'm trying to change search ranking and can't find info on how to do it.
I suspect I'm using the wrong terminology.

I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.

This is expected as word density/frequency is taken into account in the
ranking right?

Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?

What's the correct terminology to be searching for these kinds of
settings?

thanks

rob

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Modifying scoring algorithm during search operations Elasticsearch	4	410	July 6, 2017
How to change similarity without actual code Elasticsearch	5	362	July 6, 2017
Calculating with Document Frequency, not Inverse Document Frequency Elasticsearch	7	1437	July 6, 2017
When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ? How Comes !?!? Elasticsearch	8	996	July 6, 2017
Scoring rules : Text based search Elasticsearch	2	373	July 6, 2017

Ignoring Density/Frequency

Related topics