I'm trying to change search ranking and can't find info on how to do it. I
suspect I'm using the wrong terminology.
I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.
This is expected as word density/frequency is taken into account in the
ranking right?
Is there a way to configure the ranking to not take into account the length
of the doc in the scoring? Such that all docs containing Tom Jones would
rank higher than Aled Jones regardless of the length of the doc?
What's the correct terminology to be searching for these kinds of settings?
You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.
I'm trying to change search ranking and can't find info on how to do it. I
suspect I'm using the wrong terminology.
I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.
This is expected as word density/frequency is taken into account in the
ranking right?
Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?
What's the correct terminology to be searching for these kinds of settings?
You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.
I'm trying to change search ranking and can't find info on how to do it.
I suspect I'm using the wrong terminology.
I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.
This is expected as word density/frequency is taken into account in the
ranking right?
Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?
What's the correct terminology to be searching for these kinds of
settings?
On Tuesday, May 7, 2013 6:31:24 PM UTC+1, RKM wrote:
Looked at our implementation again--there are three things relevant to
your case.
You may need DFS_QUERY_THEN_FETCH to get accurate tf-idf calculations
across shards.
Use omit_norms as mentioned earlier
Use omit_tf. We use both for several fields.
Those are the three "levers" to investigate.
Hope this helps.
On Tue, May 7, 2013 at 10:24 AM, Randall McRee <randal...@gmail.com<javascript:>
wrote:
You want omit_norms: true for the field(s) whose behaviour you want to
change, I guess. Dissect the "explain" part of your query and you will see
the tf-idf calculations.
I'm trying to change search ranking and can't find info on how to do it.
I suspect I'm using the wrong terminology.
I am searching names within several million documents, say I search for
"Tom OR Jones". Some of my docs have many names in, some only a few. The
result is that a doc containing Aled Jones as the only name will score more
highly than a doc that includes Tom Jones and a handful of other names as
well.
This is expected as word density/frequency is taken into account in the
ranking right?
Is there a way to configure the ranking to not take into account the
length of the doc in the scoring? Such that all docs containing Tom Jones
would rank higher than Aled Jones regardless of the length of the doc?
What's the correct terminology to be searching for these kinds of
settings?
thanks
rob
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.