Controlling IDF calculation to use num docs of matched type rather than all docs in Index

Michael_Lawler · January 6, 2014, 2:42am

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jpountz · January 6, 2014, 11:01am

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler michael@lawler.id.au wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6BwvEhdUY532Ro67BovJmU24PGy%3DV7jLkWgXR9yv-wRw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michael_Lawler · January 6, 2014, 8:12pm

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where I
want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler <mic...@lawler.id.au<javascript:>

wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale up,
we could have orders of magnitude more children than the parents which are
important for IDF scoring.

There are other reasons why we can't use separate indexes to control this.

regards,
Michael

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59d5b303-b5f5-4274-8402-2c28463ac8a2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Riyaz · June 3, 2014, 9:55pm

Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =

On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler mic...@lawler.id.au
wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.

There are other reasons why we can't use separate indexes to control
this.

regards,
Michael

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/331c711d-97f6-4199-b6d6-1ad7869e87db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Riyaz · June 3, 2014, 9:58pm

Facing the same issue. Is there a way to make elasticsearch compute idf
with maxDoc =
instead of maxDoc =

On Monday, January 6, 2014 3:12:20 PM UTC-5, Michael Lawler wrote:

Hi Adrien,

Thanks for your comment.

Yes, I am already using different fields names across parents/child where
I want to isolate IDF from each other.

However my understanding of my particular issue here is that its not about
the isolation provided by the field names, its about the maxDocs number
that is a divisor in the maths calculating the IDF. The maxDocs number
being used is the total number of documents in the index. I want it to use
the total number of documents of a particular document type.

Michael

On Monday, January 6, 2014 10:01:03 PM UTC+11, Adrien Grand wrote:

Hi Michael,

Unfortunately this is not possible. Maybe you could use different field
names for parents and children to work around this issue?

On Mon, Jan 6, 2014 at 3:42 AM, Michael Lawler mic...@lawler.id.au
wrote:

Hi,

Subject line says it all. Is there an easy way to change the IDF
calculation so that it uses the number of documents of the matched type (or
of a predetermined type specified at query time) rather than the total
number of all documents in the index.

I am using documents with parent/child relationships, and as we scale
up, we could have orders of magnitude more children than the parents which
are important for IDF scoring.

There are other reasons why we can't use separate indexes to control
this.

regards,
Michael

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3c6580a1-3749-4e46-ab4d-f2beaded006a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94d13530-766e-4062-a3cc-7635e2238a29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Child Doc Count as Filter Elasticsearch	1	316	July 6, 2017
Scoring from multiple queries Elasticsearch	1	247	July 6, 2017
Accessing tf-idf Elasticsearch	12	6674	July 6, 2017
Calculating with Document Frequency, not Inverse Document Frequency Elasticsearch	7	1377	July 6, 2017
Re: Counts of children in parent/child data Elasticsearch	1	341	July 6, 2017

Controlling IDF calculation to use num docs of matched type rather than all docs in Index

Related topics