How to modify term frequency formula?

Thank you again Ivan (and sorry for the silence, I was away these last few
days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype
"). When I put the
line in comment, the jar is built with success but I think that the new
decodeNormValue function is not overriding the original one (normal!).
Indeed, when I search my field contents that has similarity=my_similarity,
the explanation of the score is:

...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...

I suppose that under the new similarity, the value should be 1.0, shouldn't
it?
Cheers,
Patrick

Le jeudi 3 avril 2014 12:15:15 UTC-4, Ivan Brusic a écrit :

I added a simple Maven pom to the gist:
Norm Removal Machine · GitHub

Easiest thing to do is download Maven (if you do not have it) and use it
take care handling the dependencies and build a jar if you simple execute:
mvn package

Since Elasticsearch already comes bundle with the correct jars, you can
also add those to your classpath instead. I think you only need Lucene
core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the
question marks for the correct version. I am not on Elasticsearch, so I do
not know offhand which version of Lucene is packaged.

--
Ivan

On Thu, Apr 3, 2014 at 7:44 AM, geantbrun <agin.p...@gmail.com<javascript:>

wrote:

Ivan,
Sorry but I realize (I'm totally unaware of Java) that I skipped the java
compile step (I simply put the java files in a jar file with jar cf). The
problem now is that executing :

javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar

generates errors, the first one being:

package org.apache.lucene.search.similarities does not exist

Googled it but found nothing. Any idea?
Patrick

P.S. I installed elasticsearch following the easy wayhttps://gist.github.com/wingdspur/2026107(dpkg the deb file)

Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit :

Thanks again for your great help Ivan. Does not work for me. When I
substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or
simply by BM25), it works. Is it possible that I put my jar file in the
wrong directory (usr/share/elasticsearch/lib)? Is it necessary to
register somewhere the new classes I define before restarting service?
Cheers,
Patrick

Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit :

Are you using a full class name? I have no problems with

curl -XPOST 'http://localhost:9200/sim/' -d '
{
"settings" : {
"similarity" : {
"my_similarity" : {
"type" : "org.elasticsearch.index.similarity.
NormRemovalSimilarityProvider"
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}
'

On Wed, Apr 2, 2014 at 12:03 PM, geantbrun agin.p...@gmail.com wrote:

In order to better understand the error, I copied your
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
usr/share/elasticsearch/lib. I put these 2 files in a jar named
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
tried to create the index with the same mapping as before (except that I
put "type" : "NormRemoval" in the settings of my_similarity.

The result is the same:
{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting [type]
with value [NormRemoval]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.normremoval.
NormRemovalSimilarityProvider]; ","status":500}]

I deleted the jar file just to see if the error is the same: yes it
is. It's like the new similarity is never found or loaded. Is it still
working without modifications on your side?
Cheers,
Patrick

Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :

It has been a while since I used a custom similarity, but what you
have looks right. Can you try a full class name instead?
Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
According to the error, it is looking for org.elasticsearch.index.si
milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.

--
Ivan

On Tue, Apr 1, 2014 at 7:00 AM, geantbrun agin.p...@gmail.comwrote:

Sure.

{
"settings" : {
"index" : {
"similarity" : {
"my_similarity" : {
"type" : "tfCappedSimilarity"
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" :
"0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}

If I substitute tfCappedSimilarity for tfCapped in the mapping, the
error is the same except that provider is referred as
tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
yProvider.
Cheers,
Patrick

Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :

Can you also post your mapping where you defined the similarity?

--
Ivan

On Mon, Mar 31, 2014 at 10:36 AM, geantbrun agin.p...@gmail.comwrote:

I realize that I probably have to define the similarity property
of my field as "my_similarity" (and not as "tfCappedSimilarity") and define
in the settings my_similarity as being of type tfCappedSimilarity.
When I do that, I get the following error at the index/mapping
creation:

{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting
[type] with value [tfCappedSimilarity]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil
aritySimilarityProvider]; ","status":500}]

Note that the provider is referred in the error as
tfCappedSimilaritySimilarityProvider (similarity repeated 2
times). Is it normal?
Patrick

Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :

Hi Ivan,
I followed your instructions but it does not seem to work, I must
be wrong somewhere. I created the jar file from the following two java
files, could you tell me if they are ok?

tfCappedSimilarity.java


package org.elasticsearch.index.similarity;

import org.apache.lucene.search.similarities.DefaultSimilarity;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;

public class tfCappedSimilarity extends DefaultSimilarity {

    private ESLogger logger;

    public tfCappedSimilarity() {
            logger = Loggers.getLogger(getClass());
    }

    /**
     * Capped tf value
     */
    @Override
    public float tf(float freq) {
            return (float)Math.sqrt(Math.min(9, freq));
    }

}

tfCappedSimilarityProvider.java


package org.elasticsearch.index.similarity;

import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;

public class tfCappedSimilarityProvider extends
AbstractSimilarityProvider {

    private tfCappedSimilarity similarity;

    @Inject
    public tfCappedSimilarityProvider(@Assisted String name, 

@Assisted Settings settings) {
super(name);
this.similarity = new tfCappedSimilarity();
}

    /**
     * {@inheritDoc}
     */
    @Override
    public tfCappedSimilarity get() {
            return similarity;
    }

}

In my mapping, I define the similarity property of my field as
tfCappedSimilarity, is it ok?

What makes me say that it does not work: I insert a doc with a
word repeated 16 times in my field. When I do a search with that word, the
result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is
there a way to know if the similarity was loaded or not (maybe in a log
file?).

Cheers,
Patrick

Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :

I updated my gist to illustrate the SimilarityProvider that goes
along with it. Similarities are easier to add to Elasticsearch than most
plugins. You just need to compile the two files into a jar and then add
that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). The
code will scan for every SimilarityProvider defined and load it.

You then mapping the similarity to a field: http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/mapping-core-types.html#configuring_similarity
per_field

Note that you cannot change the similarity of a field
dynamically.

Ivan

Elasticsearch Platform — Find real-time answers at scale | Elastic
e/current/mapping-core-types.html#_configuring_similarity_pe
r_field

On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <agin.p...@gmail.com

wrote:

Britta is looping over words that are passed as parameters.
It's easy to implement her script for a simple query but what about boolean
querys? In my understanding (but I could be wrong of course), I would have
to parse the query to call the script with each sub-clause, am I wrong?

I prefer your custom similarity alternative. Again, sorry for
the silly question (newbie!) but where do you put your java file? Is it the
only thing that is needed (except for the modification in the mapping)?
cheers,
Patrick

Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :

I am still on a version of Elasticsearch that does not have
access to the new scoring capabilities, so I cannot test out any scripts.
The non normalized term frequency should be the line:
tf = _index[field][word].tf()

If that is the case, you could substitute that line with
something like:
tf = Math.min(10, _index[field][word].tf())

As a stated before, I am used to using Similarities, so I find
the example easier. Here is a custom similarity that I used in
Elasticsearch (removes any norms that are indexed):
Norm Removal Machine · GitHub

The second part would be the tf() method you would need to
implement instead of decodeNormValue I used.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824
3-4aea-918a-e4e4e9588aaf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4
a-427d-952e-a203f2376fb8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Patrick,

This issue is my fault. When I used my custom similarity, I was using
Elasticsearch 0.90.2 (which uses Lucene 4.3.1). It looks like this method
was changed in Lucene 4.4 to use a long instead of a byte:

http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/DefaultSimilarity.html#decodeNormValue(long)

I changed my pom.xml example to reference Elasticsearch 1.1 without
actually testing it. My apologies. Changing the method param from byte to
long should work, but then again, my previous assumption was also wrong. :slight_smile:

Ivan

On Wed, Apr 9, 2014 at 10:29 AM, geantbrun agin.patrick@gmail.com wrote:

Thank you again Ivan (and sorry for the silence, I was away these last few
days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype
"). When I put
the line in comment, the jar is built with success but I think that the new
decodeNormValue function is not overriding the original one (normal!).
Indeed, when I search my field contents that has
similarity=my_similarity, the explanation of the score is:

...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...

I suppose that under the new similarity, the value should be 1.0,
shouldn't it?
Cheers,
Patrick

Le jeudi 3 avril 2014 12:15:15 UTC-4, Ivan Brusic a écrit :

I added a simple Maven pom to the gist: https://gist.github.com/
brusic/9786587#file-pom-xml

Easiest thing to do is download Maven (if you do not have it) and use it
take care handling the dependencies and build a jar if you simple execute:
mvn package

Since Elasticsearch already comes bundle with the correct jars, you can
also add those to your classpath instead. I think you only need Lucene
core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the
question marks for the correct version. I am not on Elasticsearch, so I do
not know offhand which version of Lucene is packaged.

--
Ivan

On Thu, Apr 3, 2014 at 7:44 AM, geantbrun agin.p...@gmail.com wrote:

Ivan,
Sorry but I realize (I'm totally unaware of Java) that I skipped the
java compile step (I simply put the java files in a jar file with jar cf).
The problem now is that executing :

javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar

generates errors, the first one being:

package org.apache.lucene.search.similarities does not exist

Googled it but found nothing. Any idea?
Patrick

P.S. I installed elasticsearch following the easy wayhttps://gist.github.com/wingdspur/2026107(dpkg the deb file)

Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit :

Thanks again for your great help Ivan. Does not work for me. When I
substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or
simply by BM25), it works. Is it possible that I put my jar file in the
wrong directory (usr/share/elasticsearch/lib)? Is it necessary to
register somewhere the new classes I define before restarting service?
Cheers,
Patrick

Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit :

Are you using a full class name? I have no problems with

curl -XPOST 'http://localhost:9200/sim/' -d '
{
"settings" : {
"similarity" : {
"my_similarity" : {
"type" : "org.elasticsearch.index.similarity.
NormRemovalSimilarityProvider"
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0"
},
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}
'

On Wed, Apr 2, 2014 at 12:03 PM, geantbrun agin.p...@gmail.comwrote:

In order to better understand the error, I copied your
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
usr/share/elasticsearch/lib. I put these 2 files in a jar named
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
tried to create the index with the same mapping as before (except that I
put "type" : "NormRemoval" in the settings of my_similarity.

The result is the same:
{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting [type]
with value [NormRemoval]]; nested: ClassNotFoundException[org.ela
sticsearch.index.similarity.normremoval.NormRemovalSimilar
ityProvider]; ","status":500}]

I deleted the jar file just to see if the error is the same: yes it
is. It's like the new similarity is never found or loaded. Is it still
working without modifications on your side?
Cheers,
Patrick

Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :

It has been a while since I used a custom similarity, but what you
have looks right. Can you try a full class name instead?
Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
According to the error, it is looking for org.elasticsearch.index.si
milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.

--
Ivan

On Tue, Apr 1, 2014 at 7:00 AM, geantbrun agin.p...@gmail.comwrote:

Sure.

{
"settings" : {
"index" : {
"similarity" : {
"my_similarity" : {
"type" : "tfCappedSimilarity"
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" :
"0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}

If I substitute tfCappedSimilarity for tfCapped in the mapping,
the error is the same except that provider is referred as
tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
yProvider.
Cheers,
Patrick

Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :

Can you also post your mapping where you defined the similarity?

--
Ivan

On Mon, Mar 31, 2014 at 10:36 AM, geantbrun agin.p...@gmail.comwrote:

I realize that I probably have to define the similarity property
of my field as "my_similarity" (and not as "tfCappedSimilarity") and define
in the settings my_similarity as being of type tfCappedSimilarity.
When I do that, I get the following error at the index/mapping
creation:

{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting
[type] with value [tfCappedSimilarity]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil
aritySimilarityProvider]; ","status":500}]

Note that the provider is referred in the error as
tfCappedSimilaritySimilarityProvider (similarity repeated 2
times). Is it normal?
Patrick

Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :

Hi Ivan,
I followed your instructions but it does not seem to work, I
must be wrong somewhere. I created the jar file from the following two java
files, could you tell me if they are ok?

tfCappedSimilarity.java


package org.elasticsearch.index.similarity;

import org.apache.lucene.search.similarities.DefaultSimilarity;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;

public class tfCappedSimilarity extends DefaultSimilarity {

    private ESLogger logger;

    public tfCappedSimilarity() {
            logger = Loggers.getLogger(getClass());
    }

    /**
     * Capped tf value
     */
    @Override
    public float tf(float freq) {
            return (float)Math.sqrt(Math.min(9, freq));
    }

}

tfCappedSimilarityProvider.java


package org.elasticsearch.index.similarity;

import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;

public class tfCappedSimilarityProvider extends
AbstractSimilarityProvider {

    private tfCappedSimilarity similarity;

    @Inject
    public tfCappedSimilarityProvider(@Assisted String

name, @Assisted Settings settings) {
super(name);
this.similarity = new tfCappedSimilarity();
}

    /**
     * {@inheritDoc}
     */
    @Override
    public tfCappedSimilarity get() {
            return similarity;
    }

}

In my mapping, I define the similarity property of my field as
tfCappedSimilarity, is it ok?

What makes me say that it does not work: I insert a doc with a
word repeated 16 times in my field. When I do a search with that word, the
result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is
there a way to know if the similarity was loaded or not (maybe in a log
file?).

Cheers,
Patrick

Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :

I updated my gist to illustrate the SimilarityProvider that
goes along with it. Similarities are easier to add to Elasticsearch than
most plugins. You just need to compile the two files into a jar and then
add that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). The
code will scan for every SimilarityProvider defined and load
it.

You then mapping the similarity to a field: http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/mapping-core-types.html#configuring_similarity
per_field

Note that you cannot change the similarity of a field
dynamically.

Ivan

Elasticsearch Platform — Find real-time answers at scale | Elastic
e/current/mapping-core-types.html#_configuring_similarity_pe
r_field

On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <
agin.p...@gmail.com> wrote:

Britta is looping over words that are passed as parameters.
It's easy to implement her script for a simple query but what about boolean
querys? In my understanding (but I could be wrong of course), I would have
to parse the query to call the script with each sub-clause, am I wrong?

I prefer your custom similarity alternative. Again, sorry for
the silly question (newbie!) but where do you put your java file? Is it the
only thing that is needed (except for the modification in the mapping)?
cheers,
Patrick

Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :

I am still on a version of Elasticsearch that does not have
access to the new scoring capabilities, so I cannot test out any scripts.
The non normalized term frequency should be the line:
tf = _index[field][word].tf()

If that is the case, you could substitute that line with
something like:
tf = Math.min(10, _index[field][word].tf())

As a stated before, I am used to using Similarities, so I
find the example easier. Here is a custom similarity that I used in
Elasticsearch (removes any norms that are indexed):
Norm Removal Machine · GitHub

The second part would be the tf() method you would need to
implement instead of decodeNormValue I used.

Cheers,

Ivan

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824
3-4aea-918a-e4e4e9588aaf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4
a-427d-952e-a203f2376fb8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBaZxZq9bU36c2PFWB6KxZKOgPULGsevZExBjaim3Lf7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Don't worry about it, your help is very appreciated Ivan!

Changing byte for long eliminates the last error, I have a new one though:
decodeNormValue(long) in
org.elasticsearch.index.similarity.NormRemovalSimilarity cannot override
decodeNormValue(long) in
org.apache.lucene.search.similarities.DefaultSimilarity;
overridden method is final

Is it due to a change in Lucene 4.4 too?

Le mercredi 9 avril 2014 13:51:04 UTC-4, Ivan Brusic a écrit :

Hi Patrick,

This issue is my fault. When I used my custom similarity, I was using
Elasticsearch 0.90.2 (which uses Lucene 4.3.1). It looks like this method
was changed in Lucene 4.4 to use a long instead of a byte:

DefaultSimilarity (Lucene 4.7.1 API)

I changed my pom.xml example to reference Elasticsearch 1.1 without
actually testing it. My apologies. Changing the method param from byte to
long should work, but then again, my previous assumption was also wrong. :slight_smile:

Ivan

On Wed, Apr 9, 2014 at 10:29 AM, geantbrun <agin.p...@gmail.com<javascript:>

wrote:

Thank you again Ivan (and sorry for the silence, I was away these last
few days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype
"). When I put
the line in comment, the jar is built with success but I think that the new
decodeNormValue function is not overriding the original one (normal!).
Indeed, when I search my field contents that has
similarity=my_similarity, the explanation of the score is:

...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...

I suppose that under the new similarity, the value should be 1.0,
shouldn't it?
Cheers,
Patrick

Le jeudi 3 avril 2014 12:15:15 UTC-4, Ivan Brusic a écrit :

I added a simple Maven pom to the gist: https://gist.github.com/
brusic/9786587#file-pom-xml

Easiest thing to do is download Maven (if you do not have it) and use it
take care handling the dependencies and build a jar if you simple execute:
mvn package

Since Elasticsearch already comes bundle with the correct jars, you can
also add those to your classpath instead. I think you only need Lucene
core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the
question marks for the correct version. I am not on Elasticsearch, so I do
not know offhand which version of Lucene is packaged.

--
Ivan

On Thu, Apr 3, 2014 at 7:44 AM, geantbrun agin.p...@gmail.com wrote:

Ivan,
Sorry but I realize (I'm totally unaware of Java) that I skipped the
java compile step (I simply put the java files in a jar file with jar cf).
The problem now is that executing :

javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar

generates errors, the first one being:

package org.apache.lucene.search.similarities does not exist

Googled it but found nothing. Any idea?
Patrick

P.S. I installed elasticsearch following the easy wayhttps://gist.github.com/wingdspur/2026107(dpkg the deb file)

Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit :

Thanks again for your great help Ivan. Does not work for me. When I
substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or
simply by BM25), it works. Is it possible that I put my jar file in the
wrong directory (usr/share/elasticsearch/lib)? Is it necessary to
register somewhere the new classes I define before restarting service?
Cheers,
Patrick

Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit :

Are you using a full class name? I have no problems with

curl -XPOST 'http://localhost:9200/sim/' -d '
{
"settings" : {
"similarity" : {
"my_similarity" : {
"type" : "org.elasticsearch.index.similarity.
NormRemovalSimilarityProvider"
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0"
},
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}
'

On Wed, Apr 2, 2014 at 12:03 PM, geantbrun agin.p...@gmail.comwrote:

In order to better understand the error, I copied your
NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in
usr/share/elasticsearch/lib. I put these 2 files in a jar named
NormRemovalSimilarity.jar. After restarting the elasticsearch service, I
tried to create the index with the same mapping as before (except that I
put "type" : "NormRemoval" in the settings of my_similarity.

The result is the same:
{"error":"IndexCreationException[[exbd] failed to create index];
nested: NoClassSettingsException[Failed to load class setting
[type] with value [NormRemoval]]; nested: ClassNotFoundException[org.
elasticsearch.index.similarity.normremoval.NormRemovalSimilar
ityProvider]; ","status":500}]

I deleted the jar file just to see if the error is the same: yes it
is. It's like the new similarity is never found or loaded. Is it still
working without modifications on your side?
Cheers,
Patrick

Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit :

It has been a while since I used a custom similarity, but what you
have looks right. Can you try a full class name instead?
Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider.
According to the error, it is looking for org.elasticsearch.index.
similarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider.

--
Ivan

On Tue, Apr 1, 2014 at 7:00 AM, geantbrun agin.p...@gmail.comwrote:

Sure.

{
"settings" : {
"index" : {
"similarity" : {
"my_similarity" : {
"type" : "tfCappedSimilarity"
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" :
"0" },
"name" : { "type" : "string", "store" : "yes", "index" :
"analyzed"},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed", "similarity" : "my_similarity"}
}
}
}
}

If I substitute tfCappedSimilarity for tfCapped in the mapping,
the error is the same except that provider is referred as
tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit
yProvider.
Cheers,
Patrick

Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit :

Can you also post your mapping where you defined the similarity?

--
Ivan

On Mon, Mar 31, 2014 at 10:36 AM, geantbrun agin.p...@gmail.comwrote:

I realize that I probably have to define the similarity property
of my field as "my_similarity" (and not as "tfCappedSimilarity") and define
in the settings my_similarity as being of type tfCappedSimilarity.
When I do that, I get the following error at the index/mapping
creation:

{"error":"IndexCreationException[[exbd] failed to create
index]; nested: NoClassSettingsException[Failed to load class
setting [type] with value [tfCappedSimilarity]]; nested:
ClassNotFoundException[org.elasticsearch.index.similarity.tf
cappedsimilarity.tfCappedSimilaritySimilarityProvider];
","status":500}]

Note that the provider is referred in the error as
tfCappedSimilaritySimilarityProvider (similarity repeated 2
times). Is it normal?
Patrick

Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit :

Hi Ivan,
I followed your instructions but it does not seem to work, I
must be wrong somewhere. I created the jar file from the following two java
files, could you tell me if they are ok?

tfCappedSimilarity.java


package org.elasticsearch.index.similarity;

import org.apache.lucene.search.similarities.
DefaultSimilarity;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;

public class tfCappedSimilarity extends DefaultSimilarity {

    private ESLogger logger;

    public tfCappedSimilarity() {
            logger = Loggers.getLogger(getClass());
    }

    /**
     * Capped tf value
     */
    @Override
    public float tf(float freq) {
            return (float)Math.sqrt(Math.min(9, freq));
    }

}

tfCappedSimilarityProvider.java


package org.elasticsearch.index.similarity;

import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;

public class tfCappedSimilarityProvider extends
AbstractSimilarityProvider {

    private tfCappedSimilarity similarity;

    @Inject
    public tfCappedSimilarityProvider(@Assisted String 

name, @Assisted Settings settings) {
super(name);
this.similarity = new tfCappedSimilarity();
}

    /**
     * {@inheritDoc}
     */
    @Override
    public tfCappedSimilarity get() {
            return similarity;
    }

}

In my mapping, I define the similarity property of my field as
tfCappedSimilarity, is it ok?

What makes me say that it does not work: I insert a doc with a
word repeated 16 times in my field. When I do a search with that word, the
result shows a tf of 4 (square root of 16) and not 3 as I was expecting, Is
there a way to know if the similarity was loaded or not (maybe in a log
file?).

Cheers,
Patrick

Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit :

I updated my gist to illustrate the SimilarityProvider that
goes along with it. Similarities are easier to add to Elasticsearch than
most plugins. You just need to compile the two files into a jar and then
add that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). The
code will scan for every SimilarityProvider defined and load
it.

You then mapping the similarity to a field: http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
current/mapping-core-types.html#configuring_similarity
per_field

Note that you cannot change the similarity of a field
dynamically.

Ivan

Elasticsearch Platform — Find real-time answers at scale | Elastic
e/current/mapping-core-types.html#_configuring_similarity_pe
r_field

On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <
agin.p...@gmail.com> wrote:

Britta is looping over words that are passed as parameters.
It's easy to implement her script for a simple query but what about boolean
querys? In my understanding (but I could be wrong of course), I would have
to parse the query to call the script with each sub-clause, am I wrong?

I prefer your custom similarity alternative. Again, sorry for
the silly question (newbie!) but where do you put your java file? Is it the
only thing that is needed (except for the modification in the mapping)?
cheers,
Patrick

Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit :

I am still on a version of Elasticsearch that does not have
access to the new scoring capabilities, so I cannot test out any scripts.
The non normalized term frequency should be the line:
tf = _index[field][word].tf()

If that is the case, you could substitute that line with
something like:
tf = Math.min(10, _index[field][word].tf())

As a stated before, I am used to using Similarities, so I
find the example easier. Here is a custom similarity that I used in
Elasticsearch (removes any norms that are indexed):
Norm Removal Machine · GitHub

The second part would be the tf() method you would need to
implement instead of decodeNormValue I used.

Cheers,

Ivan

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824
3-4aea-918a-e4e4e9588aaf%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4
a-427d-952e-a203f2376fb8%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/68488979-915
3-430b-b349-2192717677e7%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/281bd48d-ad58-48ca-be72-ff381a41b681%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

That is just unfortunate. Sneaky. :slight_smile:

Fortunately, in your case, you ultimately want to override the tf() method,
which is not marked final. If you want to still play around with the
similarity I created, you can always subclass TFIDFSimilarity and implement
the methods yourself (you can just copy the code from DefaultSimilarity).

It looks like using the function score might be the way to go. :slight_smile: Hopefully
I get to play around with the text scoring soon once I upgrade. My service
has had 100% uptime in the past couple of years, but that comes at the
price of not upgrading often.

--
Ivan

On Wed, Apr 9, 2014 at 1:40 PM, geantbrun agin.patrick@gmail.com wrote:

Don't worry about it, your help is very appreciated Ivan!

Changing byte for long eliminates the last error, I have a new one though:
decodeNormValue(long) in
org.elasticsearch.index.similarity.NormRemovalSimilarity cannot override
decodeNormValue(long) in
org.apache.lucene.search.similarities.DefaultSimilarity;
overridden method is final

Is it due to a change in Lucene 4.4 too?

Le mercredi 9 avril 2014 13:51:04 UTC-4, Ivan Brusic a écrit :

Hi Patrick,

This issue is my fault. When I used my custom similarity, I was using
Elasticsearch 0.90.2 (which uses Lucene 4.3.1). It looks like this method
was changed in Lucene 4.4 to use a long instead of a byte:

Index of /__root/docs.lucene.apache.org/core/4_7_1/core/org/apache/lucene
search/similarities/DefaultSimilarity.html#decodeNormValue(long)

I changed my pom.xml example to reference Elasticsearch 1.1 without
actually testing it. My apologies. Changing the method param from byte to
long should work, but then again, my previous assumption was also wrong. :slight_smile:

Ivan

On Wed, Apr 9, 2014 at 10:29 AM, geantbrun agin.p...@gmail.com wrote:

Thank you again Ivan (and sorry for the silence, I was away these last
few days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype
"). When I put
the line in comment, the jar is built with success but I think that the new
decodeNormValue function is not overriding the original one (normal!).
Indeed, when I search my field contents that has
similarity=my_similarity, the explanation of the score is:

...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...

I suppose that under the new similarity, the value should be 1.0,
shouldn't it?
Cheers,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDpO%2BbxPcG9%2BC4eVZu2QS6T8Q%2B9qO7bv-A47fwF7Rxhqw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You're right, tf() is not marked as final and I can override it. It works
like a charm now.
Because of you Ivan, thank you so much, your help was very very appreciated!
Cheers,
Patrick

Le mercredi 9 avril 2014 17:47:35 UTC-4, Ivan Brusic a écrit :

That is just unfortunate. Sneaky. :slight_smile:

Fortunately, in your case, you ultimately want to override the tf()
method, which is not marked final. If you want to still play around with
the similarity I created, you can always subclass TFIDFSimilarity and
implement the methods yourself (you can just copy the code from
DefaultSimilarity).

It looks like using the function score might be the way to go. :slight_smile:
Hopefully I get to play around with the text scoring soon once I upgrade.
My service has had 100% uptime in the past couple of years, but that comes
at the price of not upgrading often.

--
Ivan

On Wed, Apr 9, 2014 at 1:40 PM, geantbrun <agin.p...@gmail.com<javascript:>

wrote:

Don't worry about it, your help is very appreciated Ivan!

Changing byte for long eliminates the last error, I have a new one
though:
decodeNormValue(long) in
org.elasticsearch.index.similarity.NormRemovalSimilarity cannot override
decodeNormValue(long) in
org.apache.lucene.search.similarities.DefaultSimilarity;
overridden method is final

Is it due to a change in Lucene 4.4 too?

Le mercredi 9 avril 2014 13:51:04 UTC-4, Ivan Brusic a écrit :

Hi Patrick,

This issue is my fault. When I used my custom similarity, I was using
Elasticsearch 0.90.2 (which uses Lucene 4.3.1). It looks like this method
was changed in Lucene 4.4 to use a long instead of a byte:

Index of /__root/docs.lucene.apache.org/core/4_7_1/core/org/apache/lucene
search/similarities/DefaultSimilarity.html#decodeNormValue(long)

I changed my pom.xml example to reference Elasticsearch 1.1 without
actually testing it. My apologies. Changing the method param from byte to
long should work, but then again, my previous assumption was also wrong. :slight_smile:

Ivan

On Wed, Apr 9, 2014 at 10:29 AM, geantbrun agin.p...@gmail.com wrote:

Thank you again Ivan (and sorry for the silence, I was away these last
few days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java ("method
does not override or implement a method from a supertype
"). When I
put the line in comment, the jar is built with success but I think that the
new decodeNormValue function is not overriding the original one
(normal!). Indeed, when I search my field contents that has
similarity=my_similarity, the explanation of the score is:

...
{
"value": 0.25,
"description": "fieldNorm(doc=0)"
}
...

I suppose that under the new similarity, the value should be 1.0,
shouldn't it?
Cheers,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c4c3e01-9a73-46ee-974d-a01c4d8feaee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.