Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.
I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.
There are other reasons why we can't use separate indexes to control this.
Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.
I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.
There are other reasons why we can't use separate indexes to control this.
Yes, I am already using different fields names across parents/child where I
want to isolate IDF from each other.
However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.
Michael
On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:
Hi Michael,
Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?
On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler <mic...@lawler.id.au<javascript:>
wrote:
Hi,
Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.
I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.
There are other reasons why we can't use separate indexes to control this.
Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =
On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:
Hi Adrien,
Thanks for your comment.
Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.
However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.
Michael
On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:
Hi Michael,
Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?
Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.
I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.
There are other reasons why we can't use separate indexes to control
this.
Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =
On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:
Hi Adrien,
Thanks for your comment.
Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.
However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.
Michael
On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:
Hi Michael,
Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?
Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.
I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.
There are other reasons why we can't use separate indexes to control
this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.