I am fresh to ES, and i have a task that i dont know what approach is best
to take.
our data is a simple line of text and some number fields, and our queries
are only on the line of text.
when I query a few terms, (as far as i understand) the score gets
calculated in such way, that prefers multiple occurrences of terms in the
text, and also prefers longer matches.
if i would want to change that (say, dont mind how many times a term
appeared, and dont mind the length), i would write this in lucene:
public class MySimilarity extends DefaultSimilarity {
@Override
//We don't care about how many times a term appears in the text
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost(); //ignore length factor
}
}
now my question is - is there a way to do this kind of things in ES, so i
dont have to actually write code, ie use the dsl?
this should be equivalent to what you want and you can influence how much
weight the boost gets at runtime.
simon
On Wednesday, February 20, 2013 5:31:54 PM UTC+1, Shlomi wrote:
Hey,
I am fresh to ES, and i have a task that i dont know what approach is best
to take.
our data is a simple line of text and some number fields, and our queries
are only on the line of text.
when I query a few terms, (as far as i understand) the score gets
calculated in such way, that prefers multiple occurrences of terms in the
text, and also prefers longer matches.
if i would want to change that (say, dont mind how many times a term
appeared, and dont mind the length), i would write this in lucene:
public class MySimilarity extends DefaultSimilarity {
@Override
//We don't care about how many times a term appears in the text
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost(); //ignore length factor
}
}
now my question is - is there a way to do this kind of things in ES, so i
dont have to actually write code, ie use the dsl?
"omit_norms" seemed to do the job right, but "index_options" set to "docs"
made searches that are not direct term unavailable, meaning i couldnt do
query_string like: +bre* -break* AND "Tons of"
this should be equivalent to what you want and you can influence how much
weight the boost gets at runtime.
simon
On Wednesday, February 20, 2013 5:31:54 PM UTC+1, Shlomi wrote:
Hey,
I am fresh to ES, and i have a task that i dont know what approach is
best to take.
our data is a simple line of text and some number fields, and our queries
are only on the line of text.
when I query a few terms, (as far as i understand) the score gets
calculated in such way, that prefers multiple occurrences of terms in the
text, and also prefers longer matches.
if i would want to change that (say, dont mind how many times a term
appeared, and dont mind the length), i would write this in lucene:
public class MySimilarity extends DefaultSimilarity {
@Override
//We don't care about how many times a term appears in the text
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost(); //ignore length factor
}
}
now my question is - is there a way to do this kind of things in ES, so i
dont have to actually write code, ie use the dsl?
On Thursday, February 21, 2013 11:02:15 AM UTC+1, Shlomi wrote:
Hey
Thank you for your response,
"omit_norms" seemed to do the job right, but "index_options" set to "docs"
made searches that are not direct term unavailable, meaning i couldnt do
query_string like: +bre* -break* AND "Tons of"
ah I see yeah setting this to "docs" will drop positions and queries like
"Tons of" won't work anymore. UniqueTokenFitler should do the job here!
this should be equivalent to what you want and you can influence how much
weight the boost gets at runtime.
simon
On Wednesday, February 20, 2013 5:31:54 PM UTC+1, Shlomi wrote:
Hey,
I am fresh to ES, and i have a task that i dont know what approach is
best to take.
our data is a simple line of text and some number fields, and our
queries are only on the line of text.
when I query a few terms, (as far as i understand) the score gets
calculated in such way, that prefers multiple occurrences of terms in the
text, and also prefers longer matches.
if i would want to change that (say, dont mind how many times a term
appeared, and dont mind the length), i would write this in lucene:
public class MySimilarity extends DefaultSimilarity {
@Override
//We don't care about how many times a term appears in the text
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost(); //ignore length factor
}
}
now my question is - is there a way to do this kind of things in ES, so
i dont have to actually write code, ie use the dsl?
So I tired that, and it worked fine until i tried to query something like
"bye bye", which was not distinguishable from "bye" (as opposed with "bye
now" for instance)..
of course i could do shingle token filter, but that would needlessly
enlarge my index size..
any other suggestions?
On Saturday, February 23, 2013 12:39:47 AM UTC+2, simonw wrote:
On Thursday, February 21, 2013 11:02:15 AM UTC+1, Shlomi wrote:
Hey
Thank you for your response,
"omit_norms" seemed to do the job right, but "index_options" set to
"docs" made searches that are not direct term unavailable, meaning i
couldnt do query_string like: +bre* -break* AND "Tons of"
ah I see yeah setting this to "docs" will drop positions and queries like
"Tons of" won't work anymore. UniqueTokenFitler should do the job here!
this should be equivalent to what you want and you can influence how
much weight the boost gets at runtime.
simon
On Wednesday, February 20, 2013 5:31:54 PM UTC+1, Shlomi wrote:
Hey,
I am fresh to ES, and i have a task that i dont know what approach is
best to take.
our data is a simple line of text and some number fields, and our
queries are only on the line of text.
when I query a few terms, (as far as i understand) the score gets
calculated in such way, that prefers multiple occurrences of terms in the
text, and also prefers longer matches.
if i would want to change that (say, dont mind how many times a term
appeared, and dont mind the length), i would write this in lucene:
public class MySimilarity extends DefaultSimilarity {
@Override
//We don't care about how many times a term appears in the text
public float tf(float freq) {
return freq == 0 ? 0 : 1;
}
@Override
public float computeNorm(String field, FieldInvertState state) {
return state.getBoost(); //ignore length factor
}
}
now my question is - is there a way to do this kind of things in ES, so
i dont have to actually write code, ie use the dsl?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.