Controlling IDF calculation to use num docs of matched type rather than all docs in Index


(Michael Lawler) #1

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler michael@lawler.id.au wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6BwvEhdUY532Ro67BovJmU24PGy%3DV7jLkWgXR9yv-wRw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Michael Lawler) #3

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where I
want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler <mic...@lawler.id.au<javascript:>

wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59d5b303-b5f5-4274-8402-2c28463ac8a2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Riyaz) #4

Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =

On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler mic...@lawler.id.au
wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.

There are other reasons why we can't use separate indexes to control
this.

regards,
Michael

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/331c711d-97f6-4199-b6d6-1ad7869e87db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Riyaz) #5

Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =

On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler mic...@lawler.id.au
wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.

There are other reasons why we can't use separate indexes to control
this.

regards,
Michael

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94d13530-766e-4062-a3cc-7635e2238a29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6