Thank you again Ivan (and sorry for the silence, I was away these last few
days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype"). When I put the
line in comment, the jar is built with success but I think that the new
decodeNormValue function is not overriding the original one (normal!).
Indeed, when I search my field contents that has similarity=my_similarity,
the explanation of the score is:
...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...
I suppose that under the new similarity, the value should be 1.0, shouldn't
it?
Cheers,
Patrick
Le jeudi 3 avril 2014 12:15:15 UTC-4, Ivan Brusic a écrit :
I added a simple Maven pom to the gist:
Norm Removal Machine · GitHubEasiest thing to do is download Maven (if you do not have it) and use it
take care handling the dependencies and build a jar if you simple execute:
mvn packageSince Elasticsearch already comes bundle with the correct jars, you can
also add those to your classpath instead. I think you only need Lucene
core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the
question marks for the correct version. I am not on Elasticsearch, so I do
not know offhand which version of Lucene is packaged.--
IvanOn Thu, Apr 3, 2014 at 7:44 AM, geantbrun <agin.p...@gmail.com<javascript:>
wrote:
Ivan,
Sorry but I realize (I'm totally unaware of Java) that I skipped the java
compile step (I simply put the java files in a jar file with jar cf). The
problem now is that executing :javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar
generates errors, the first one being:
package org.apache.lucene.search.similarities does not exist
Googled it but found nothing. Any idea?
PatrickP.S. I installed elasticsearch following the easy wayhttps://gist.github.com/wingdspur/2026107(dpkg the deb file)
Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit :
Thanks again for your great help Ivan. Does not work for me. When I
substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or
simply by BM25), it works. Is it possible that I put my jar file in the
wrong directory (usr/share/elasticsearch/lib)? Is it necessary to
register somewhere the new classes I define before restarting service?
Cheers,
PatrickLe mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit :
Are you using a full class name? I have no problems with
curl -XPOST 'http://localhost:9200/sim/' -d '
{
"settings" : {
"similarity" : {
"my_similarity" : {
"type" : "org.elasticsearch.index.similarity.
NormRemovalSimilarityProvider"
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}
'On Wed, Apr 2, 2014 at 12:03 PM, geantbrun agin.p...@gmail.com wrote:
In order to better understand the error, I copied your
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
usr/share/elasticsearch/lib. I put these 2 files in a jar named
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
tried to create the index with the same mapping as before (except that I
put "type" : "NormRemoval" in the settings of my_similarity.The result is the same:
{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting [type]
with value [NormRemoval]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.normremoval.
NormRemovalSimilarityProvider]; ","status":500}]I deleted the jar file just to see if the error is the same: yes it
is. It's like the new similarity is never found or loaded. Is it still
working without modifications on your side?
Cheers,
PatrickLe mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :
It has been a while since I used a custom similarity, but what you
have looks right. Can you try a full class name instead?
Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
According to the error, it is looking for org.elasticsearch.index.si
milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.--
IvanOn Tue, Apr 1, 2014 at 7:00 AM, geantbrun agin.p...@gmail.comwrote:
Sure.
{
"settings" : {
"index" : {
"similarity" : {
"my_similarity" : {
"type" : "tfCappedSimilarity"
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" :
"0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}If I substitute tfCappedSimilarity for tfCapped in the mapping, the
error is the same except that provider is referred as
tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
yProvider.
Cheers,
PatrickLe lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :
Can you also post your mapping where you defined the similarity?
--
IvanOn Mon, Mar 31, 2014 at 10:36 AM, geantbrun agin.p...@gmail.comwrote:
I realize that I probably have to define the similarity property
of my field as "my_similarity" (and not as "tfCappedSimilarity") and define
in the settings my_similarity as being of type tfCappedSimilarity.
When I do that, I get the following error at the index/mapping
creation:{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting
[type] with value [tfCappedSimilarity]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil
aritySimilarityProvider]; ","status":500}]Note that the provider is referred in the error as
tfCappedSimilaritySimilarityProvider (similarity repeated 2
times). Is it normal?
PatrickLe lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :
Hi Ivan,
I followed your instructions but it does not seem to work, I must
be wrong somewhere. I created the jar file from the following two java
files, could you tell me if they are ok?tfCappedSimilarity.java
package org.elasticsearch.index.similarity;
import org.apache.lucene.search.similarities.DefaultSimilarity;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;public class tfCappedSimilarity extends DefaultSimilarity {
private ESLogger logger; public tfCappedSimilarity() { logger = Loggers.getLogger(getClass()); } /** * Capped tf value */ @Override public float tf(float freq) { return (float)Math.sqrt(Math.min(9, freq)); }
}
tfCappedSimilarityProvider.java
package org.elasticsearch.index.similarity;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;public class tfCappedSimilarityProvider extends
AbstractSimilarityProvider {private tfCappedSimilarity similarity; @Inject public tfCappedSimilarityProvider(@Assisted String name,
@Assisted Settings settings) {
super(name);
this.similarity = new tfCappedSimilarity();
}/** * {@inheritDoc} */ @Override public tfCappedSimilarity get() { return similarity; }
}
In my mapping, I define the similarity property of my field as
tfCappedSimilarity, is it ok?What makes me say that it does not work: I insert a doc with a
word repeated 16 times in my field. When I do a search with that word, the
result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is
there a way to know if the similarity was loaded or not (maybe in a log
file?).Cheers,
PatrickLe mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :
I updated my gist to illustrate the SimilarityProvider that goes
along with it. Similarities are easier to add to Elasticsearch than most
plugins. You just need to compile the two files into a jar and then add
that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). The
code will scan for every SimilarityProvider defined and load it.You then mapping the similarity to a field: http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/mapping-core-types.html#configuring_similarity
per_fieldNote that you cannot change the similarity of a field
dynamically.Ivan
Elasticsearch Platform — Find real-time answers at scale | Elastic
e/current/mapping-core-types.html#_configuring_similarity_pe
r_fieldOn Wed, Mar 26, 2014 at 12:49 PM, geantbrun <agin.p...@gmail.com
wrote:
Britta is looping over words that are passed as parameters.
It's easy to implement her script for a simple query but what about boolean
querys? In my understanding (but I could be wrong of course), I would have
to parse the query to call the script with each sub-clause, am I wrong?I prefer your custom similarity alternative. Again, sorry for
the silly question (newbie!) but where do you put your java file? Is it the
only thing that is needed (except for the modification in the mapping)?
cheers,
PatrickLe mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :
I am still on a version of Elasticsearch that does not have
access to the new scoring capabilities, so I cannot test out any scripts.
The non normalized term frequency should be the line:
tf = _index[field][word].tf()If that is the case, you could substitute that line with
something like:
tf = Math.min(10, _index[field][word].tf())As a stated before, I am used to using Similarities, so I find
the example easier. Here is a custom similarity that I used in
Elasticsearch (removes any norms that are indexed):
Norm Removal Machine · GitHubThe second part would be the tf() method you would need to
implement instead of decodeNormValue I used.Cheers,
Ivan
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824
3-4aea-918a-e4e4e9588aaf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer
.For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4
a-427d-952e-a203f2376fb8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer
.For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer
.For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer
.For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.