Hi Ivan,
I followed your instructions but it does not seem to work, I must be wrong
somewhere. I created the jar file from the following two java files, could
you tell me if they are ok?
tfCappedSimilarity.java
package org.elasticsearch.index.similarity;
import org.apache.lucene.search.similarities.DefaultSimilarity;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
public class tfCappedSimilarity extends DefaultSimilarity {
private ESLogger logger;
public tfCappedSimilarity() {
logger = Loggers.getLogger(getClass());
}
/**
* Capped tf value
*/
@Override
public float tf(float freq) {
return (float)Math.sqrt(Math.min(9, freq));
}
}
tfCappedSimilarityProvider.java
package org.elasticsearch.index.similarity;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
public class tfCappedSimilarityProvider extends AbstractSimilarityProvider {
private tfCappedSimilarity similarity;
@Inject
public tfCappedSimilarityProvider(@Assisted String name, @Assisted
Settings settings) {
super(name);
this.similarity = new tfCappedSimilarity();
}
/**
* {@inheritDoc}
*/
@Override
public tfCappedSimilarity get() {
return similarity;
}
}
In my mapping, I define the similarity property of my field as
tfCappedSimilarity, is it ok?
What makes me say that it does not work: I insert a doc with a word
repeated 16 times in my field. When I do a search with that word, the
result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is
there a way to know if the similarity was loaded or not (maybe in a log
file?).
Cheers,
Patrick
Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :
I updated my gist to illustrate the SimilarityProvider that goes along
with it. Similarities are easier to add to Elasticsearch than most plugins.
You just need to compile the two files into a jar and then add that jar
into Elasticsearch's classpath ($ES_HOME/lib most likely). The code will
scan for every SimilarityProvider defined and load it.
You then mapping the similarity to a field:
Elasticsearch Platform — Find real-time answers at scale | Elastic
Note that you cannot change the similarity of a field dynamically.
Ivan
Elasticsearch Platform — Find real-time answers at scale | Elastic
On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <agin.p...@gmail.com<javascript:>
wrote:
Britta is looping over words that are passed as parameters. It's easy to
implement her script for a simple query but what about boolean querys? In
my understanding (but I could be wrong of course), I would have to parse
the query to call the script with each sub-clause, am I wrong?
I prefer your custom similarity alternative. Again, sorry for the silly
question (newbie!) but where do you put your java file? Is it the only
thing that is needed (except for the modification in the mapping)?
cheers,
Patrick
Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :
I am still on a version of Elasticsearch that does not have access to
the new scoring capabilities, so I cannot test out any scripts. The non
normalized term frequency should be the line:
tf = _index[field][word].tf()
If that is the case, you could substitute that line with something like:
tf = Math.min(10, _index[field][word].tf())
As a stated before, I am used to using Similarities, so I find the
example easier. Here is a custom similarity that I used in Elasticsearch
(removes any norms that are indexed):
Norm Removal Machine · GitHub
The second part would be the tf() method you would need to implement
instead of decodeNormValue I used.
Cheers,
Ivan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/680df8c0-6621-4184-87b6-50a955bccae3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.