Custom Query variables?

Hi everyone,

I'm creating on my own a little Geocoder. My goal is to be able to retrieve
a big city or a country with a string on input. This string can be
mistyped, so I indexed geonames cities5000 data (cities > 5000 inhab), and
crossed theses data with countries & admin data. So I got a 46000 cities
index with country, admin & pop.

I created a search_field in which I put country, admin & city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like "Paris" in
"search_field". Unfortunately, the first result is Paris... in Canada...

Still, the "search_field" data is this one, for Paris (CA) and Paris (FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
"Paris" in the second one...

Anyway, is there any way to make the number of "my_query" terms appearance
make the difference ? Because with alternate names, there will be so much
much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't want
it to... I thought of a custom query score, but I don't think I'm able to
get the query term in the script query.

Any ideas ?

Thanks !

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/edddf66e-9553-479b-bb68-dfef8b2ba36b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Up ? Any ideas ?

Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit :

Hi everyone,

I'm creating on my own a little Geocoder. My goal is to be able to
retrieve a big city or a country with a string on input. This string can be
mistyped, so I indexed geonames cities5000 data (cities > 5000 inhab), and
crossed theses data with countries & admin data. So I got a 46000 cities
index with country, admin & pop.

I created a search_field in which I put country, admin & city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like "Paris" in
"search_field". Unfortunately, the first result is Paris... in Canada...

Still, the "search_field" data is this one, for Paris (CA) and Paris (FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
"Paris" in the second one...

Anyway, is there any way to make the number of "my_query" terms appearance
make the difference ? Because with alternate names, there will be so much
much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't want
it to... I thought of a custom query score, but I don't think I'm able to
get the query term in the script query.

Any ideas ?

Thanks !

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you enable explanations, you can see why Lucene the rational behind the
scoring:

You are probably correct in that the array length is influencing the
scoring. By default, Lucene will rate higher fields with fewer terms by
using length normalization. You can disable norms on the field:

You can fine-tune better by learning how to read Lucene's explanations. It
is difficult at first, but it is a useful skill.

Cheers,

Ivan

On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutruche@octo.com
wrote:

Up ? Any ideas ?

Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit :

Hi everyone,

I'm creating on my own a little Geocoder. My goal is to be able to
retrieve a big city or a country with a string on input. This string can be
mistyped, so I indexed geonames cities5000 data (cities > 5000 inhab), and
crossed theses data with countries & admin data. So I got a 46000 cities
index with country, admin & pop.

I created a search_field in which I put country, admin & city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like "Paris" in
"search_field". Unfortunately, the first result is Paris... in Canada...

Still, the "search_field" data is this one, for Paris (CA) and Paris (FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
"Paris" in the second one...

Anyway, is there any way to make the number of "my_query" terms
appearance make the difference ? Because with alternate names, there will
be so much much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't want
it to... I thought of a custom query score, but I don't think I'm able to
get the query term in the script query.

Any ideas ?

Thanks !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

For geo search, it would be a good approach to respect the searchers
preference by using a locale, so I suggest to add a locale "fr" filter to
the search.
Or an origin is added to the start query and all cities are ordered by geo
distance in relation to the origin. For country search, the origin could be
the capital city...

Jörg

On Wed, Jul 2, 2014 at 6:38 PM, Ivan Brusic ivan@brusic.com wrote:

If you enable explanations, you can see why Lucene the rational behind the
scoring:

Elasticsearch Platform — Find real-time answers at scale | Elastic

You are probably correct in that the array length is influencing the
scoring. By default, Lucene will rate higher fields with fewer terms by
using length normalization. You can disable norms on the field:

Elasticsearch Platform — Find real-time answers at scale | Elastic

You can fine-tune better by learning how to read Lucene's explanations. It
is difficult at first, but it is a useful skill.

Cheers,

Ivan

On Tue, Jul 1, 2014 at 1:02 AM, Pierrick Boutruche pboutruche@octo.com
wrote:

Up ? Any ideas ?

Le lundi 30 juin 2014 17:48:54 UTC+2, Pierrick Boutruche a écrit :

Hi everyone,

I'm creating on my own a little Geocoder. My goal is to be able to
retrieve a big city or a country with a string on input. This string can be
mistyped, so I indexed geonames cities5000 data (cities > 5000 inhab), and
crossed theses data with countries & admin data. So I got a 46000 cities
index with country, admin & pop.

I created a search_field in which I put country, admin & city name +
alternate names provided in cities5000 file.

I want, within this array, search for a string.

Currently, I'm just searching with a MatchQuery, like "Paris" in
"search_field". Unfortunately, the first result is Paris... in Canada...

Still, the "search_field" data is this one, for Paris (CA) and Paris
(FR):

[u'Paris', u'Paris', u'Canada', u'Ontario', u'Ontario']

[u'Paris', u'Paris', u'France', u'\xcele-de-France', u'Ile-de-France', u'Paris', u'Paris']

I don't understand why Paris, CA is first, 'cause there's so much more
"Paris" in the second one...

Anyway, is there any way to make the number of "my_query" terms
appearance make the difference ? Because with alternate names, there will
be so much much more Paris that it has te count.

Actually I think the array length matters in the scoring and I don't
want it to... I thought of a custom query score, but I don't think I'm able
to get the query term in the script query.

Any ideas ?

Thanks !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f1e6aec-697c-46fc-882e-d8927783fab5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCa9nNPX-7oQgjXq6AsFVUyxarDOq9SQ3w6M2MMgT2rNQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGUyH0-GBjqAYOrMvuEL_ERA82MMdGEK2GHCBEmOcGOFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.