Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Yea, I agree. Though its tricky to expose this functionality nicely... .
Basically, on the indexing side, there is a possibility to control the
weighting of a term for example by adding a payload to it that represent the
score. And then, when searching, using a custom query that takes those
payloads and adds them to the scoring. This means that when indexing, you
will need to provide the weighting (or just the special ones), for example:
"the women^2 went to the beach". This does mean that for each document you
want to change the term weights, it will need to be indexed with a new
value(s), for example: "the women^1.5 went to the beach^2". That format (^)
denoting custom weight is something I invented here for the example. Then,
when searching, those weights will be taken into account using a special
custom_weight query that will be created for it. Does that sound like
something that will help you?
On Fri, Sep 10, 2010 at 5:00 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Sorry for long delay, I was out of town. I believe that this problem splits
into two:
Controlling index-wide terms weight, let's call it IDF. This is the
feature with immediate influence on practical applications as it makes
introduction of ranking analysis algorithms like SVMRank much easier and
strain forward. I don't think however that it should be kept in indexing
step. That solution simply breaks the logic: IDF is index-wide and indexing
document is "document-wide". I believe that ES needs some terms API that can
be used to query for indexed terms information and also for setting their
personal weights.
One particular problem I can foresee here is text analysis. It's hard to
predict how exactly terms from my documents will be transformed during
indexing. May be it's worth to return a set of analyzed terms as response
for document indexing?
Controlling document-wide weights of terms, let's call it TF. Although
our conversation started with it, I believe that it much less important and
useful then index-wide IDF. Your solution with "the women^1.5 went to the
beach^2" is very nice, but I have one concern: how can it play with
contradictive inputs? For instance "cat^1 cats^2" in analyzed field. "cat"
and "cats" will be analyzed into one single term that will have two weights
(1, 2). Out of ideas how to design it in proper way.
BTW, is there any way to set all IDF weights for some particular field to
1.0? I found that "omit_term_freq_and_positions" parameter disables TF, but
how to turn off IDF?
Yea, I agree. Though its tricky to expose this functionality nicely... .
Basically, on the indexing side, there is a possibility to control the
weighting of a term for example by adding a payload to it that represent the
score. And then, when searching, using a custom query that takes those
payloads and adds them to the scoring. This means that when indexing, you
will need to provide the weighting (or just the special ones), for example:
"the women^2 went to the beach". This does mean that for each document you
want to change the term weights, it will need to be indexed with a new
value(s), for example: "the women^1.5 went to the beach^2". That format (^)
denoting custom weight is something I invented here for the example. Then,
when searching, those weights will be taken into account using a special
custom_weight query that will be created for it. Does that sound like
something that will help you?
On Fri, Sep 10, 2010 at 5:00 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
What you are asking for is quite advanced, and there are things that you
can do and things that you simply can't mainly because of how Lucene, the
underlying search library elasticsearch uses is built. For example, being
able to change in runtime the weight of a term is not something that is
possible in Lucene. I suggest you read this page which gives a good overview
of how Lucene does its scoring: Apache Lucene - Scoring, and we can continue the
discussion from there?
-shay.banon
On Tue, Sep 21, 2010 at 12:56 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Hi Shay.
Sorry for long delay, I was out of town. I believe that this problem splits
into two:
Controlling index-wide terms weight, let's call it IDF. This is the
feature with immediate influence on practical applications as it makes
introduction of ranking analysis algorithms like SVMRank much easier and
strain forward. I don't think however that it should be kept in indexing
step. That solution simply breaks the logic: IDF is index-wide and indexing
document is "document-wide". I believe that ES needs some terms API that can
be used to query for indexed terms information and also for setting their
personal weights.
One particular problem I can foresee here is text analysis. It's hard to
predict how exactly terms from my documents will be transformed during
indexing. May be it's worth to return a set of analyzed terms as response
for document indexing?
Controlling document-wide weights of terms, let's call it TF. Although
our conversation started with it, I believe that it much less important and
useful then index-wide IDF. Your solution with "the women^1.5 went to the
beach^2" is very nice, but I have one concern: how can it play with
contradictive inputs? For instance "cat^1 cats^2" in analyzed field. "cat"
and "cats" will be analyzed into one single term that will have two weights
(1, 2). Out of ideas how to design it in proper way.
BTW, is there any way to set all IDF weights for some particular field to
1.0? I found that "omit_term_freq_and_positions" parameter disables TF, but
how to turn off IDF?
Yea, I agree. Though its tricky to expose this functionality nicely... .
Basically, on the indexing side, there is a possibility to control the
weighting of a term for example by adding a payload to it that represent the
score. And then, when searching, using a custom query that takes those
payloads and adds them to the scoring. This means that when indexing, you
will need to provide the weighting (or just the special ones), for example:
"the women^2 went to the beach". This does mean that for each document you
want to change the term weights, it will need to be indexed with a new
value(s), for example: "the women^1.5 went to the beach^2". That format (^)
denoting custom weight is something I invented here for the example. Then,
when searching, those weights will be taken into account using a special
custom_weight query that will be created for it. Does that sound like
something that will help you?
On Fri, Sep 10, 2010 at 5:00 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
What you are asking for is quite advanced, and there are things that you
can do and things that you simply can't mainly because of how Lucene, the
underlying search library elasticsearch uses is built. For example, being
able to change in runtime the weight of a term is not something that is
possible in Lucene. I suggest you read this page which gives a good overview
of how Lucene does its scoring: Apache Lucene - Scoring, and we can continue the
discussion from there?
-shay.banon
On Tue, Sep 21, 2010 at 12:56 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Hi Shay.
Sorry for long delay, I was out of town. I believe that this problem
splits into two:
Controlling index-wide terms weight, let's call it IDF. This is the
feature with immediate influence on practical applications as it makes
introduction of ranking analysis algorithms like SVMRank much easier and
strain forward. I don't think however that it should be kept in indexing
step. That solution simply breaks the logic: IDF is index-wide and indexing
document is "document-wide". I believe that ES needs some terms API that can
be used to query for indexed terms information and also for setting their
personal weights.
One particular problem I can foresee here is text analysis. It's hard to
predict how exactly terms from my documents will be transformed during
indexing. May be it's worth to return a set of analyzed terms as response
for document indexing?
Controlling document-wide weights of terms, let's call it TF. Although
our conversation started with it, I believe that it much less important and
useful then index-wide IDF. Your solution with "the women^1.5 went to the
beach^2" is very nice, but I have one concern: how can it play with
contradictive inputs? For instance "cat^1 cats^2" in analyzed field. "cat"
and "cats" will be analyzed into one single term that will have two weights
(1, 2). Out of ideas how to design it in proper way.
BTW, is there any way to set all IDF weights for some particular field to
1.0? I found that "omit_term_freq_and_positions" parameter disables TF, but
how to turn off IDF?
Yea, I agree. Though its tricky to expose this functionality nicely... .
Basically, on the indexing side, there is a possibility to control the
weighting of a term for example by adding a payload to it that represent the
score. And then, when searching, using a custom query that takes those
payloads and adds them to the scoring. This means that when indexing, you
will need to provide the weighting (or just the special ones), for example:
"the women^2 went to the beach". This does mean that for each document you
want to change the term weights, it will need to be indexed with a new
value(s), for example: "the women^1.5 went to the beach^2". That format (^)
denoting custom weight is something I invented here for the example. Then,
when searching, those weights will be taken into account using a special
custom_weight query that will be created for it. Does that sound like
something that will help you?
On Fri, Sep 10, 2010 at 5:00 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Great, note that all extension points (custom Similarity and so on) can be
hooked into elasticsearch easily. Just ping if you want more info. As a side
note, I know that there has been a lot of discussion going on in Lucene to
try and open this aspect a bit more to allow for more custom
implementations.
On Thu, Sep 23, 2010 at 10:27 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
What you are asking for is quite advanced, and there are things that
you can do and things that you simply can't mainly because of how Lucene,
the underlying search library elasticsearch uses is built. For example,
being able to change in runtime the weight of a term is not something that
is possible in Lucene. I suggest you read this page which gives a good
overview of how Lucene does its scoring: Apache Lucene - Scoring, and we can continue the
discussion from there?
-shay.banon
On Tue, Sep 21, 2010 at 12:56 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Hi Shay.
Sorry for long delay, I was out of town. I believe that this problem
splits into two:
Controlling index-wide terms weight, let's call it IDF. This is the
feature with immediate influence on practical applications as it makes
introduction of ranking analysis algorithms like SVMRank much easier and
strain forward. I don't think however that it should be kept in indexing
step. That solution simply breaks the logic: IDF is index-wide and indexing
document is "document-wide". I believe that ES needs some terms API that can
be used to query for indexed terms information and also for setting their
personal weights.
One particular problem I can foresee here is text analysis. It's hard to
predict how exactly terms from my documents will be transformed during
indexing. May be it's worth to return a set of analyzed terms as response
for document indexing?
Controlling document-wide weights of terms, let's call it TF. Although
our conversation started with it, I believe that it much less important and
useful then index-wide IDF. Your solution with "the women^1.5 went to the
beach^2" is very nice, but I have one concern: how can it play with
contradictive inputs? For instance "cat^1 cats^2" in analyzed field. "cat"
and "cats" will be analyzed into one single term that will have two weights
(1, 2). Out of ideas how to design it in proper way.
BTW, is there any way to set all IDF weights for some particular field to
1.0? I found that "omit_term_freq_and_positions" parameter disables TF, but
how to turn off IDF?
Yea, I agree. Though its tricky to expose this functionality nicely... .
Basically, on the indexing side, there is a possibility to control the
weighting of a term for example by adding a payload to it that represent the
score. And then, when searching, using a custom query that takes those
payloads and adds them to the scoring. This means that when indexing, you
will need to provide the weighting (or just the special ones), for example:
"the women^2 went to the beach". This does mean that for each document you
want to change the term weights, it will need to be indexed with a new
value(s), for example: "the women^1.5 went to the beach^2". That format (^)
denoting custom weight is something I invented here for the example. Then,
when searching, those weights will be taken into account using a special
custom_weight query that will be created for it. Does that sound like
something that will help you?
On Fri, Sep 10, 2010 at 5:00 PM, Mykhailo Korbakov rmihael@gmail.comwrote:
Oh I wish that this weird trick would be my wish
I can come up with solution for few of these corrected weights, but
unfortunately there's a plan for thousands of them.
Don't you think that term weight access (both TF and IDF parts) can be
useful in ES? Having some way to modify IDF weights is really useful for
advanced ranking algorithms with implicit feedback processing. As for now we
have to pass terms weights specified in queries and it doesn't very
convenient.
No, you can't do that. But you can control it on the search aspect, by
constructing a boosted query and filtered by specific docs. This solution
really depends on the number of custom weights you wish to set though... .
On Fri, Sep 10, 2010 at 4:39 PM, Mykhailo Korbakov <rmihael@gmail.com
wrote:
Unfortunately it's not my case.
My goal is to make changes like this: "set weight of term 'woman' in
document with id 1231231231 to 1.5". I don't think that it's possible with
custom_score, or at least I'm missing something important about it.
Is there any way to change term weight in the context of one
particular document? Or, other words, to change TF value of
specific
term for specific document?
I need this trick to promote specific documents in the context of
predefined queries. May be there's some better way to achieve it? I
just can think any other then TF manipulation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.