Relevancy sorting of result returned


(cyrilforce) #1

Hi All,

I am trying to query my ES index with the following queries :

It returned result that are not sort in based on relevancy :

For instance ther result returned is :

  1. "_source": {
    "DISPLAY_NAME": "The Western Sheriff Rapid Fast Gunman",
    "LONG_DESCRIPTION": "The Western Sheriff Radid Fast
    Gunman is about a western sheriff who protects the justice. When the
    criminal gangs came to battle with the sheriff, come and protect them who
    you *love *Fighting!",
    "SHORT_DESCRIPTION": "The Western Sheriff Radid Fast
    Gunman is about a western sheriff who protect the justice. When the
    criminal gangs came to battle with the sheriff come and protect them who
    you love Fighting",
    }

  2. "_source": {
    "DISPLAY_NAME": "Beach Ball Crab Mayhem HD",
    "LONG_DESCRIPTION": "Crabs *LOVE *Volleyball! Come join
    in the fun with crazy locations and fun characters! You can even challenge
    your friends via Bluetooth!",
    "SHORT_DESCRIPTION": "Crabs *LOVE *Volleyball! Come
    join the fun and play your friends via Bluetooth!",

  3. "_source": {
    "DISPLAY_NAME": "love",
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": "love",

I would like to know how i could change the existing query to return result
based on relevancy on specific fields eg : the 3) to be appear in as in the
1st result (Always use DISPLAY_NAME to determine the relevancy)?

And also is the relevancy in ES is based on the scoring ?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13e04156-148b-43df-bb12-b6209e27eb1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

I'd start with a simple multi_match query:

{
"query": {
"multi_match": {
"query": "love",
"fields": [ "DISPLAY_NAME^2", "LONG_DESCRIPTION", "SHORT_DESCRIPTION"
]
}
}
}

Yes relevancy is based on the score.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae927bdc-1590-471f-b8b9-d950721c3ad2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #3

Hi,

Thanks for you reply. I would like to know what's the DISPLAY_NAME^2 for.
It that indicated priority should be given to display_name field ?

Also i tried the following :

{
"from" : 0,
"size" : 100,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "love",
"fields": [ "DISPLAY_NAME^3", "LONG_DESCRIPTION",
"SHORT_DESCRIPTION", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However still i have found some records without love in DISPLAY_NAME
appear before the one that have love in it.

eg :

  1. "DISPLAY_NAME": "Singing Cowboy", "LONG_DESCRIPTION":
    null, "PERFORMER": "Love",
  2. "DISPLAY_NAME": "Love Is More Than Words Or Bet", "LONG_DESCRIPTION":
    null,"PERFORMER": "Love",

Thanks.

On Fri, Mar 28, 2014 at 9:27 PM, Binh Ly binhly_es@yahoo.com wrote:

I'd start with a simple multi_match query:

{
"query": {
"multi_match": {
"query": "love",
"fields": [ "DISPLAY_NAME^2", "LONG_DESCRIPTION",
"SHORT_DESCRIPTION" ]
}
}
}

Yes relevancy is based on the score.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae927bdc-1590-471f-b8b9-d950721c3ad2%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ae927bdc-1590-471f-b8b9-d950721c3ad2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg9XM9oRJ8ywu-ex93Zj2JbOtWNUYYae5FdgWyNgdWnXMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #4

^ is a boost - so it makes the match score higher. Aboout your other
question, that's default behavior for Lucene scoring - i.e., fields that
are shorter will have higher relevancy against your query terms. You can
disable norms if you don't want this behavior:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/099604d4-b5f2-492b-b8cd-185d66293921%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #5

Hi Binh,

Thanks. Excellent info that you shared. In addition i would like to know
how actually the scores calculated as the two queries below yield different
results :

{
"from" : 0,
"size" : 100,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "love",
"fields": [ "DISPLAY_NAME^6", "LONG_DESCRIPTION",
"SHORT_DESCRIPTION", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

*Result *:
"_score": 1.372128,
"_source": {
"DISPLAY_NAME": "Listen To My Song",
"PRICE": 5,

{
"from" : 0,
"size" : 100,
"query" : {
"filtered" : {
"query" : {
"bool" : {
"should" : [ {
"wildcard" : {
"DISPLAY_NAME" : {"value": "love", "boost": 6}
}
}, {
"wildcard" : {
"LONG_DESCRIPTION" : "love"
}
}, {
"wildcard" : {
"SHORT_DESCRIPTION" : "love"
}
}, {
"wildcard" : {
"PERFORMER" : "love"
}
} ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

*Result *:

            "_score":* 0.040032037*,
            "_source": {
                "DISPLAY_NAME": "Listen To My Song",

On Tue, Apr 1, 2014 at 5:59 AM, Binh Ly binhly_es@yahoo.com wrote:

^ is a boost - so it makes the match score higher. Aboout your other
question, that's default behavior for Lucene scoring - i.e., fields that
are shorter will have higher relevancy against your query terms. You can
disable norms if you don't want this behavior:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#norms

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/099604d4-b5f2-492b-b8cd-185d66293921%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/099604d4-b5f2-492b-b8cd-185d66293921%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8L_a5TNMyZa7Q9k%3D%2BR012W_tXckkp%2B_MiRVcacyktEmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #6

If you specify explain=true in your query, it will tell you in detail how
the score is computed:

{
"explain": true,
"query": {}
}

Some useful info:

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #7

Hi Binh,

Great. Thanks for that.

On Wed, Apr 2, 2014 at 12:05 AM, Binh Ly binhly_es@yahoo.com wrote:

If you specify explain=true in your query, it will tell you in detail how
the score is computed:

{
"explain": true,
"query": {}
}

Some useful info:

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_onEHeshd%3Do2_wzN%3DcaWbqBBdJwUW2Y5p_0P5rL%2B8-1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #8

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have added the
boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in 23025,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description": "idf(docFreq=58,
    maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description": "weight(DISPLAY_NAME:happy^6.0
in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in 9092,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description": "idf(docFreq=80,
maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in 11167,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description": "idf(docFreq=80,
    maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even it
have different text value for both. Also how i could reverse the #2 and #3
as what i want the result returned is based on relevancy thus i assume that
it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

On Wed, Apr 2, 2014 at 6:28 PM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Binh,

Great. Thanks for that.

On Wed, Apr 2, 2014 at 12:05 AM, Binh Ly binhly_es@yahoo.com wrote:

If you specify explain=true in your query, it will tell you in detail how
the score is computed:

{
"explain": true,
"query": {}
}

Some useful info:

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/523ccc24-90a5-4b1a-aca1-bd1018e041aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_Dd5miHxXAuQg0_mnRNCKp4hnzuUpRbsJFq3xPMfYZ%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #9

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have added
the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in 23025,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description": "weight(DISPLAY_NAME:happy^6.0
in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in 9092,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in 11167,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even it
have different text value for both. Also how i could reverse the #2 and #3
as what i want the result returned is based on relevancy thus i assume that
it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #10

Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}

My client is nagging about the result relevancy returned. You know business
user always compare with google search result and stuff. lol. For now i am
scratching my head to sort this problem out. My use case is search through
by the display_name and performer and display as the closest possible in
the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks

On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic ivan@brusic.com wrote:

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have added
the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in 23025,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description": "weight(DISPLAY_NAME:happy^6.0
in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in 9092,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0),
with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description": "weight(DISPLAY_NAME:happy^6.0
    in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in 11167,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0),
    with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even it
have different text value for both. Also how i could reverse the #2 and #3
as what i want the result returned is based on relevancy thus i assume that
it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg__Ghng%3Dzgb_FTrEzUvyB2cejzLrqnB1iTgvuegEDK4-g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #11

Hi,

Discovered that the score values are influenced by the shards and nodes
where the document stored.

Therefore specified the preference and query_type in the search query
however i still have no idea to get the result i wanted.

*The query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
  "query": "happy",
  "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Results : *

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "27071",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 2210)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 2210, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with freq
    of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=2210)"
    }
    ]
    }
    ]
    }
  1. "_shard": 0,
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "565689",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Be Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10189)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10189, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with freq
    of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10189)"
    }
    ]
    }
    ]
    }

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "425585",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 4,
    .....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10367)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10367, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with freq
    of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10367)"
    }
    ]
    }
    ]
    }
    },

It is weird that it returned same score values even though the DISPLAY_NAME
is not same. I didn't disable the norm.
Anyone have any idea ?

On Thu, Apr 3, 2014 at 2:01 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}

My client is nagging about the result relevancy returned. You know
business user always compare with google search result and stuff. lol. For
now i am scratching my head to sort this problem out. My use case is search
through by the display_name and performer and display as the closest
possible in the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks

On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic ivan@brusic.com wrote:

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have added
the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in
    23025, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description":
"weight(DISPLAY_NAME:happy^6.0 in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in 9092,
product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in
    11167, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even it
have different text value for both. Also how i could reverse the #2 and #3
as what i want the result returned is based on relevancy thus i assume that
it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #12

The number of shards only affects the inverse document frequency. Items
such as the norm are document specific and are not affected by the number
of shards.

I did not notice it before, but the document with DISPLAY_NAME of "Be
Happy" is probably scoring the same as the others because "Be" is a stop
word and therefore removed from the index. You end up matching Happy with
Happy, which is the same as the other documents.

Try using an analyzer without stopwords. Query tuning is hard work.

Cheers,

Ivan

On Fri, Apr 4, 2014 at 2:46 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi,

Discovered that the score values are influenced by the shards and nodes
where the document stored.

Therefore specified the preference and query_type in the search query
however i still have no idea to get the result i wanted.

*The query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
  "query": "happy",
  "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Results : *

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "27071",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 2210)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 2210, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=2210)"
    }
    ]
    }
    ]
    }
  1. "_shard": 0,
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "565689",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Be Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10189)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10189, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10189)"
    }
    ]
    }
    ]
    }

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "425585",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 4,
    .....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10367)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10367, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10367)"
    }
    ]
    }
    ]
    }
    },

It is weird that it returned same score values even though the
DISPLAY_NAME is not same. I didn't disable the norm.
Anyone have any idea ?

On Thu, Apr 3, 2014 at 2:01 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}

My client is nagging about the result relevancy returned. You know
business user always compare with google search result and stuff. lol. For
now i am scratching my head to sort this problem out. My use case is search
through by the display_name and performer and display as the closest
possible in the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks

On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic ivan@brusic.com wrote:

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have added
the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in
    23025, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description":
"weight(DISPLAY_NAME:happy^6.0 in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in
9092, product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in
    11167, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even
it have different text value for both. Also how i could reverse the #2 and
#3 as what i want the result returned is based on relevancy thus i assume
that it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCV5akPuRYFyc%3DiFOdje%3D8kgQ0xvPAa3K65iumUhVzrOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #13

Hi Ivan,

I am trying to disable the stopwords and i am using version <1.0 ES. The
following is the query i ran :

{
"explain": true,
"query": {
"match_phrase": {
"DISPLAY_NAME": {
"query": "happy",
"operator": "and",
* "analyzer": { "stop" : { "type":"stop", "stopwords" : "none" }}*
}
}
}
}

However it throws me error :

"error": "SearchPhaseExecutionException[Failed to execute phase [dfs],

total failure; shardFailures {[kr37FCksStOKW5ZCo6PCwQ][jdbc_dev][0]:
SearchParseException[[jdbc_dev][0]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\n "explain": true,\n "query": {\n
"match_phrase": {\n "DISPLAY_NAME": {\n "query":
"happy",\n "operator": "and",\n "analyzer": {
"stop" : { "type":"stop", "stopwords" : "none" }}\n }\n
}\n }\n}]]]; nested: QueryParsingException[[jdbc_dev] [match] query does
not support [stopwords]]; }{[VYQt633MTUuJdAwL--PE3A][jdbc_dev][1]:

May i know how to properly include stopwords = "none" in the query or was
it unavailable version prior than 1.0 ES.
I can find any relevant information in the documentation. Thanks.

On Fri, Apr 4, 2014 at 10:11 PM, Ivan Brusic ivan@brusic.com wrote:

The number of shards only affects the inverse document frequency. Items
such as the norm are document specific and are not affected by the number
of shards.

I did not notice it before, but the document with DISPLAY_NAME of "Be
Happy" is probably scoring the same as the others because "Be" is a stop
word and therefore removed from the index. You end up matching Happy with
Happy, which is the same as the other documents.

Try using an analyzer without stopwords. Query tuning is hard work.

Cheers,

Ivan

On Fri, Apr 4, 2014 at 2:46 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi,

Discovered that the score values are influenced by the shards and nodes
where the document stored.

Therefore specified the preference and query_type in the search query
however i still have no idea to get the result i wanted.

*The query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
  "query": "happy",
  "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Results : *

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "27071",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 2210)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 2210, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=2210)"
    }
    ]
    }
    ]
    }
  1. "_shard": 0,
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "565689",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Be Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10189)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10189, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10189)"
    }
    ]
    }
    ]
    }

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "425585",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 4,
    .....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10367)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10367, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10367)"
    }
    ]
    }
    ]
    }
    },

It is weird that it returned same score values even though the
DISPLAY_NAME is not same. I didn't disable the norm.
Anyone have any idea ?

On Thu, Apr 3, 2014 at 2:01 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}

My client is nagging about the result relevancy returned. You know
business user always compare with google search result and stuff. lol. For
now i am scratching my head to sort this problem out. My use case is search
through by the display_name and performer and display as the closest
possible in the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks

On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic ivan@brusic.com wrote:

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since the
field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have
added the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in
    23025, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description":
"weight(DISPLAY_NAME:happy^6.0 in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in
9092, product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in
    11167, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even
it have different text value for both. Also how i could reverse the #2 and
#3 as what i want the result returned is based on relevancy thus i assume
that it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCV5akPuRYFyc%3DiFOdje%3D8kgQ0xvPAa3K65iumUhVzrOg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCV5akPuRYFyc%3DiFOdje%3D8kgQ0xvPAa3K65iumUhVzrOg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg9spws5BSsb2XKPHMHWZgPKkk7f1-kL4RGn%2Bm0k1J7E4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #14

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter": "lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer": "keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq
of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq
of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq
of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even though
the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

On Sat, Apr 5, 2014 at 12:39 PM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

I am trying to disable the stopwords and i am using version <1.0 ES. The
following is the query i ran :

{
"explain": true,
"query": {
"match_phrase": {
"DISPLAY_NAME": {
"query": "happy",
"operator": "and",
* "analyzer": { "stop" : { "type":"stop", "stopwords" : "none"
}}*
}
}
}
}

However it throws me error :

"error": "SearchPhaseExecutionException[Failed to execute phase [dfs],

total failure; shardFailures {[kr37FCksStOKW5ZCo6PCwQ][jdbc_dev][0]:
SearchParseException[[jdbc_dev][0]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{\n "explain": true,\n "query": {\n
"match_phrase": {\n "DISPLAY_NAME": {\n "query":
"happy",\n "operator": "and",\n "analyzer": {
"stop" : { "type":"stop", "stopwords" : "none" }}\n }\n
}\n }\n}]]]; nested: QueryParsingException[[jdbc_dev] [match] query does
not support [stopwords]]; }{[VYQt633MTUuJdAwL--PE3A][jdbc_dev][1]:

May i know how to properly include stopwords = "none" in the query or
was it unavailable version prior than 1.0 ES.
I can find any relevant information in the documentation. Thanks.

On Fri, Apr 4, 2014 at 10:11 PM, Ivan Brusic ivan@brusic.com wrote:

The number of shards only affects the inverse document frequency. Items
such as the norm are document specific and are not affected by the number
of shards.

I did not notice it before, but the document with DISPLAY_NAME of "Be
Happy" is probably scoring the same as the others because "Be" is a stop
word and therefore removed from the index. You end up matching Happy with
Happy, which is the same as the other documents.

Try using an analyzer without stopwords. Query tuning is hard work.

Cheers,

Ivan

On Fri, Apr 4, 2014 at 2:46 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi,

Discovered that the score values are influenced by the shards and nodes
where the document stored.

Therefore specified the preference and query_type in the search query
however i still have no idea to get the result i wanted.

*The query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
  "query": "happy",
  "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Results : *

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "27071",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 2210)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 2210, product
    of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=2210)"
    }
    ]
    }
    ]
    }
  1. "_shard": 0,
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "565689",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Be Happy",
    "PRICE": 1.5,
    ....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10189)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10189,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10189)"
    }
    ]
    }
    ]
    }

  • "_shard": 0,*
    "_node": "kr37FCksStOKW5ZCo6PCwQ",
    "_index": "jdbc_dev",
    "_type": "media",
    "_id": "425585",
    "_score": 10.450976,
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 4,
    .....
    "_explanation": {
    "value": 10.450976,
    "description": "weight(DISPLAY_NAME:happy in 10367)
    [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.450976,
    "description": "fieldWeight in 10367,
    product of:",
    "details": [
    {
    "value": 1,
    "description": "tf(freq=1.0), with
    freq of:",
    "details": [
    {
    "value": 1,
    "description": "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.450976,
    "description": "idf(docFreq=501,
    maxDocs=6385732)"
    },
    {
    "value": 1,
    "description": "fieldNorm(doc=10367)"
    }
    ]
    }
    ]
    }
    },

It is weird that it returned same score values even though the
DISPLAY_NAME is not same. I didn't disable the norm.
Anyone have any idea ?

On Thu, Apr 3, 2014 at 2:01 AM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Ivan,

Nope i didn't disable the norm. Here's the mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
"BILLINGTYPE_ID": {
"type": "long"
},
"CATMEDIA_CDATE": {
"type": "date",
"format": "dateOptionalTime"
},
"CATMEDIA_NAME": {
"type": "string"
},
"CATMEDIA_RANK": {
"type": "long"
},
"CAT_ID": {
"type": "long"
},
"CAT_NAME": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"CAT_PARENT": {
"type": "long"
},
"CHANNEL_ID": {
"type": "long"
},
"CKEY": {
"type": "long"
},
"DISPLAY_NAME": {
"type": "string"
},
"FTID": {
"type": "string"
},
"GENRE": {
"type": "string"
},
"ITEMCODE": {
"type": "string"
},
"KEYWORDS": {
"type": "string"
},
"LANG_ID": {
"type": "long"
},
"LONG_DESCRIPTION": {
"type": "string"
},
"MAPPINGS": {
"type": "string",
"analyzer": "string_lowercase",
"include_in_all": true
},
"MEDIA_ID": {
"type": "long"
},
"MEDIA_PKEY": {
"type": "string"
},
"PERFORMER": {
"type": "string"
},
"PLAYER": {
"type": "string"
},
"POSITION": {
"type": "long"
},
"PRICE": {
"type": "double"
},
"PRIORITY": {
"type": "long"
},
"SHORTCODE": {
"type": "string"
},
"SHORT_DESCRIPTION": {
"type": "string"
},
"TYPE_ID": {
"type": "long"
},
"VIEW_ID": {
"type": "long"
}
}
}
}

My client is nagging about the result relevancy returned. You know
business user always compare with google search result and stuff. lol. For
now i am scratching my head to sort this problem out. My use case is search
through by the display_name and performer and display as the closest
possible in the top of the list.

eg :

1)Happy
2)Happy
3)Be Happy

Would be deeply appreciated if you could shed me some light. Thanks

On Thu, Apr 3, 2014 at 1:51 AM, Ivan Brusic ivan@brusic.com wrote:

All the documents have the same score since they have the same field
weight, idf (always the same when you only have one search term) and term
frequency (each document has the term once).

It appears that you disabled norms on the DISPLAY_NAME field since
the field norm is 1. Is this correct? Can you provide the mapping? If you
disable norms, you will no longer get length normalization, which would
provide the ordering you desire since the field norms will penalize the
longer field, but it not might be ideal for every search. Relevancy
ultimately depends on you and your use cases. Another option is to enable
term vectors [1] (or index the number of terms yourself) and see if the
resulting field has the same number of tokens returned. Very kludgy.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Cheers,

Ivan

On Wed, Apr 2, 2014 at 4:02 AM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Binh,

The same problem again. I have the following queries :

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {
"filtered" : {
"query" : {
"multi_match": {
"query": "happy",
"fields": [ "DISPLAY_NAME^6", "PERFORMER" ]
}
},
"filter" : {
"query" : {
"bool" : {
"must" : {
"term" : {
"CHANNEL_ID" : "1"
}
}
}
}
}
}
}
}

However the result display in reverse order for #2 and #3. I have
added the boost in the DISPLAY_NAME but still yield the same behaviour :

  • "_score": 10.960511,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 981,
    "MEDIA_ID": 390933,
    "GENRE": "Happy",
    "MEDIA_PKEY": "838644",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 43,
    "POSITION": 51399,
    "ITEMCODE": null,
    "CAT_ID": 982,
    "PRIORITY": 80,
    "CKEY": 757447,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 74,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Mario Pacchioli",*
    "MAPPINGS": "1_43_982_POP_981_51399_5",
    "SHORTCODE": null,
    "CATMEDIA_CDATE": "2014-01-12T15:12:27.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.960511,
    "description": "max of:",
    "details": [
    {
    "value": 10.960511,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 23025) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.960511,
    "description": "fieldWeight in
    23025, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.960511,
    "description":
    "idf(docFreq=58, maxDocs=1249243)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=23025)"
    }
    ]
    }
    ]
    }
    ]
    }
    }

"_id": "10194",
* "_score": 10.699952,*
"_source": {
"DISPLAY_NAME": "Be Happy",
"PRICE": 1.5,
"CHANNEL_ID": 1,
"CAT_PARENT": 557,
"MEDIA_ID": 10194,
"GENRE": "Be Happy",
"MEDIA_PKEY": "534570",
"COMPOSER": null,
"PLAYER": null,
"CATMEDIA_NAME": "Be Happy",
"FTID": null,
"VIEW_ID": 241,
"POSITION": 6733,
"ITEMCODE": "33271",
"CAT_ID": 558,
"PRIORITY": 100,
"CKEY": 528380,
"CATMEDIA_RANK": 3,
"BILLINGTYPE_ID": 1,
"CAT_NAME": "POP",
"KEYWORDS": null,
"LONG_DESCRIPTION": null,
"SHORT_DESCRIPTION": null,
"TYPE_ID": 76,
"ARTIST_GENDER": null,
* "PERFORMER": "Mary J. Blige",*
"MAPPINGS": "1_241_558_POP_557_6733_1.5",
"SHORTCODE": "0012139471",
"CATMEDIA_CDATE": "2014-01-26T20:04:46.000Z",
"LANG_ID": 1
},
"_explanation": {
"value": 10.699952,
"description": "max of:",
"details": [
{
"value": 10.699952,
"description":
"weight(DISPLAY_NAME:happy^6.0 in 9092) [PerFieldSimilarity], result of:",
"details": [
{
"value": 10.699952,
"description": "fieldWeight in
9092, product of:",
"details": [
{
"value": 1,
"description":
"tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.699952,
"description":
"idf(docFreq=80, maxDocs=1321663)"
},
{
"value": 1,
"description":
"fieldNorm(doc=9092)"
}
]
}
]
}
]
}
},

  • "_score": 10.699952,*
    "_source": {
    "DISPLAY_NAME": "Happy",
    "PRICE": 1.5,
    "CHANNEL_ID": 1,
    "CAT_PARENT": 557,
    "MEDIA_ID": 8615,
    "GENRE": "Happy",
    "MEDIA_PKEY": "533022",
    "COMPOSER": null,
    "PLAYER": null,
    "CATMEDIA_NAME": "Happy",
    "FTID": null,
    "VIEW_ID": 241,
    "POSITION": 5685,
    "ITEMCODE": "11927",
    "CAT_ID": 558,
    "PRIORITY": 100,
    "CKEY": 526838,
    "CATMEDIA_RANK": 3,
    "BILLINGTYPE_ID": 1,
    "CAT_NAME": "POP",
    "KEYWORDS": null,
    "LONG_DESCRIPTION": null,
    "SHORT_DESCRIPTION": null,
    "TYPE_ID": 76,
    "ARTIST_GENDER": null,
    * "PERFORMER": "Ashanti",*
    "MAPPINGS": "1_241_558_POP_557_5685_1.5",
    "SHORTCODE": "0012139036",
    "CATMEDIA_CDATE": "2014-01-26T20:03:44.000Z",
    "LANG_ID": 1
    },
    "_explanation": {
    "value": 10.699952,
    "description": "max of:",
    "details": [
    {
    "value": 10.699952,
    "description":
    "weight(DISPLAY_NAME:happy^6.0 in 11167) [PerFieldSimilarity], result of:",
    "details": [
    {
    "value": 10.699952,
    "description": "fieldWeight in
    11167, product of:",
    "details": [
    {
    "value": 1,
    "description":
    "tf(freq=1.0), with freq of:",
    "details": [
    {
    "value": 1,
    "description":
    "termFreq=1.0"
    }
    ]
    },
    {
    "value": 10.699952,
    "description":
    "idf(docFreq=80, maxDocs=1321663)"
    },
    {
    "value": 1,
    "description":
    "fieldNorm(doc=11167)"
    }
    ]
    }
    ]
    }
    ]
    }
    },

May i know how could the #2 and #3 yield the same scoring values even
it have different text value for both. Also how i could reverse the #2 and
#3 as what i want the result returned is based on relevancy thus i assume
that it should
return in this order.

1)Happy
2)Happy
3)Be Happy

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD9aw%3Dh21OW_bJG4qbQ2TenQXa%2Bof8tgasVJqE16Bbysg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8Wg4xhUAqa3HrYDAOQ311iPrRL8EKAHniXLopCRie1Yg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCV5akPuRYFyc%3DiFOdje%3D8kgQ0xvPAa3K65iumUhVzrOg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCV5akPuRYFyc%3DiFOdje%3D8kgQ0xvPAa3K65iumUhVzrOg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_pqwMDv5oP6V0NAXkZq8F_gO4m%2B7_jHaWRMLdWTrZ8wg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #15

Lucene will indeed, by default, give a higher score to shorter text, but
the "shortness" is the number of tokens, not the number of characters. In
your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter": "lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer":
"keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&
preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even
though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #16

Hi Ivan,

Because I wanted the similiar result sorted in this way :

  1. Be happy
  2. Be happy
  3. Happy ways

Currently it is sorted :

  1. Be happy
  2. Happy ways
  3. Be happy

Due to that it return the same scoring. Any suggestion ?

Thanks

On 6 Apr, 2014, at 4:24 am, Ivan Brusic ivan@brusic.com wrote:

Lucene will indeed, by default, give a higher score to shorter text, but the "shortness" is the number of tokens, not the number of characters. In your last example, each field has two tokens, so the length is the same. The term frequency is also the same for each document ("happy" appears once) and the inverse document frequency is the same (always the case with single term queries), so the score will be exactly the same for every document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.com wrote:
Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query itself. I tried to set the stopwords="none" via
index and its mapping :

Index settings:

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter": "lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer": "keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
"index.analysis.analyzer.standard.type": "standard",
"index.analysis.analyzer.standard.stopwords": "none"
}
}
}

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
"analyzer": "standard"
},
....
}
}

Query :

/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
  		"query": "happy",
  		"fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
     	"bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

Result :

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485) [PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93, maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102) [PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93, maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277) [PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93, maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of character/length when it compute the score. I remember somewhere in the document indicate that by default the algorithm should give higher score to the document that have shorter text on the searched field however this doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #17

You can index the number of characters in your string into a new field and
then do a secondary sort on this field.

Are you testing against real data or only against some test set? The Lucene
scoring model will improve with the addition of more documents. As more
documents are added, the term frequencies and inverse document frequencies
start to diverge and contribute more to the scoring. You will not have many
documents with the same score.

--
Ivan

On Sun, Apr 6, 2014 at 12:38 AM, cheehoo84@gmail.com wrote:

Hi Ivan,

Because I wanted the similiar result sorted in this way :

  1. Be happy
  2. Be happy
  3. Happy ways

Currently it is sorted :

  1. Be happy
  2. Happy ways
  3. Be happy

Due to that it return the same scoring. Any suggestion ?

Thanks

On 6 Apr, 2014, at 4:24 am, Ivan Brusic ivan@brusic.com wrote:

Lucene will indeed, by default, give a higher score to shorter text, but
the "shortness" is the number of tokens, not the number of characters. In
your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter":
"lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer":
"keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&
preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even
though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.comhttps://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #18

Hi Ivan,

Hmm... This seems like a viable workaround however just wanted to know is
there any other ways to do it ?
Because this doesn't seems like a unique problem i guess as most users will
expect to get the similarity sorted (when performing search) based on the
following order:

1.Happy
2.Be Happy
3.Be Happy
4.Happy Together

It is live data in production.I had 180k documents resided in 5 shards
within 5 nodes with one replica each. Even with 180k documents i still
having this similarity order issue coupled with inconsistency issue due to
it fetch from primary and replica intermittently. Therefore i need to use
/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary
to solve the inconsistency and now left with this sorting to be solve.

Thanks.

On Mon, Apr 7, 2014 at 7:13 AM, Ivan Brusic ivan@brusic.com wrote:

You can index the number of characters in your string into a new field and
then do a secondary sort on this field.

Are you testing against real data or only against some test set? The
Lucene scoring model will improve with the addition of more documents. As
more documents are added, the term frequencies and inverse document
frequencies start to diverge and contribute more to the scoring. You will
not have many documents with the same score.

--
Ivan

On Sun, Apr 6, 2014 at 12:38 AM, cheehoo84@gmail.com wrote:

Hi Ivan,

Because I wanted the similiar result sorted in this way :

  1. Be happy
  2. Be happy
  3. Happy ways

Currently it is sorted :

  1. Be happy
  2. Happy ways
  3. Be happy

Due to that it return the same scoring. Any suggestion ?

Thanks

On 6 Apr, 2014, at 4:24 am, Ivan Brusic ivan@brusic.com wrote:

Lucene will indeed, by default, give a higher score to shorter text, but
the "shortness" is the number of tokens, not the number of characters. In
your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter":
"lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer":
"keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&
preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277, product
of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even
though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.comhttps://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_C7UMU%3D3VmPdVoaKBOAwSa%2BwciKjajDm7prrJEDH7u7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #19

I do not think most users would expect the results in that order. The
character length does not provide relevance for most cases. Why is a
shorter word more relevant? I would say that most would rank "Happy
Together" higher since word proximity is a helpful metric. Happy should
rank first due to the length norm.

You can always play around with the function score, but I rather deal with
non-dynamic metrics at indexing time.

--
Ivan

On Mon, Apr 7, 2014 at 8:23 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Hmm... This seems like a viable workaround however just wanted to know is
there any other ways to do it ?
Because this doesn't seems like a unique problem i guess as most users
will expect to get the similarity sorted (when performing search) based on
the following order:

1.Happy
2.Be Happy
3.Be Happy
4.Happy Together

It is live data in production.I had 180k documents resided in 5 shards
within 5 nodes with one replica each. Even with 180k documents i still
having this similarity order issue coupled with inconsistency issue due to
it fetch from primary and replica intermittently. Therefore i need to use
/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary
to solve the inconsistency and now left with this sorting to be solve.

Thanks.

On Mon, Apr 7, 2014 at 7:13 AM, Ivan Brusic ivan@brusic.com wrote:

You can index the number of characters in your string into a new field
and then do a secondary sort on this field.

Are you testing against real data or only against some test set? The
Lucene scoring model will improve with the addition of more documents. As
more documents are added, the term frequencies and inverse document
frequencies start to diverge and contribute more to the scoring. You will
not have many documents with the same score.

--
Ivan

On Sun, Apr 6, 2014 at 12:38 AM, cheehoo84@gmail.com wrote:

Hi Ivan,

Because I wanted the similiar result sorted in this way :

  1. Be happy
  2. Be happy
  3. Happy ways

Currently it is sorted :

  1. Be happy
  2. Happy ways
  3. Be happy

Due to that it return the same scoring. Any suggestion ?

Thanks

On 6 Apr, 2014, at 4:24 am, Ivan Brusic ivan@brusic.com wrote:

Lucene will indeed, by default, give a higher score to shorter text, but
the "shortness" is the number of tokens, not the number of characters. In
your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter":
"lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer":
"keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&
preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even
though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.comhttps://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_C7UMU%3D3VmPdVoaKBOAwSa%2BwciKjajDm7prrJEDH7u7Q%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_C7UMU%3D3VmPdVoaKBOAwSa%2BwciKjajDm7prrJEDH7u7Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCGjs7koyQAdr9A%3DZoiQsCeWpSNKce892uoun29ZbBi8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(cyrilforce) #20

Thanks ivan!

On Tue, Apr 8, 2014 at 1:09 PM, Ivan Brusic ivan@brusic.com wrote:

I do not think most users would expect the results in that order. The
character length does not provide relevance for most cases. Why is a
shorter word more relevant? I would say that most would rank "Happy
Together" higher since word proximity is a helpful metric. Happy should
rank first due to the length norm.

You can always play around with the function score, but I rather deal with
non-dynamic metrics at indexing time.

--
Ivan

On Mon, Apr 7, 2014 at 8:23 AM, chee hoo lum cheehoo84@gmail.com wrote:

Hi Ivan,

Hmm... This seems like a viable workaround however just wanted to know is
there any other ways to do it ?
Because this doesn't seems like a unique problem i guess as most users
will expect to get the similarity sorted (when performing search) based on
the following order:

1.Happy
2.Be Happy
3.Be Happy
4.Happy Together

It is live data in production.I had 180k documents resided in 5 shards
within 5 nodes with one replica each. Even with 180k documents i still
having this similarity order issue coupled with inconsistency issue due to
it fetch from primary and replica intermittently. Therefore i need to use
/media/_search?pretty=&search_type=dfs_query_then_fetch&preference=_primary
to solve the inconsistency and now left with this sorting to be solve.

Thanks.

On Mon, Apr 7, 2014 at 7:13 AM, Ivan Brusic ivan@brusic.com wrote:

You can index the number of characters in your string into a new field
and then do a secondary sort on this field.

Are you testing against real data or only against some test set? The
Lucene scoring model will improve with the addition of more documents. As
more documents are added, the term frequencies and inverse document
frequencies start to diverge and contribute more to the scoring. You will
not have many documents with the same score.

--
Ivan

On Sun, Apr 6, 2014 at 12:38 AM, cheehoo84@gmail.com wrote:

Hi Ivan,

Because I wanted the similiar result sorted in this way :

  1. Be happy
  2. Be happy
  3. Happy ways

Currently it is sorted :

  1. Be happy
  2. Happy ways
  3. Be happy

Due to that it return the same scoring. Any suggestion ?

Thanks

On 6 Apr, 2014, at 4:24 am, Ivan Brusic ivan@brusic.com wrote:

Lucene will indeed, by default, give a higher score to shorter text,
but the "shortness" is the number of tokens, not the number of characters.
In your last example, each field has two tokens, so the length is the same.
The term frequency is also the same for each document ("happy" appears
once) and the inverse document frequency is the same (always the case with
single term queries), so the score will be exactly the same for every
document. Why should the scoring by any different?

Cheers,

Ivan

On Fri, Apr 4, 2014 at 10:31 PM, chee hoo lum cheehoo84@gmail.comwrote:

Hi Ivan,

Since i not sure how analyzer with stopwords can be set in the query
itself. I tried to set the stopwords="none" via
index and its mapping :

*Index settings: *

{
"jdbc_dev": {
"settings": {
"index.analysis.analyzer.string_lowercase.filter":
"lowercase",
"index.number_of_replicas": "1",
"index.analysis.analyzer.string_lowercase.tokenizer":
"keyword",
"index.number_of_shards": "5",
"index.version.created": "900199",
* "index.analysis.analyzer.standard.type": "standard",*

  •        "index.analysis.analyzer.standard.stopwords": "_none_"*
      }
    
    }
    }

Type Mapping :

{
"media": {
"properties": {
"AUDIO": {
"type": "string"
},
....
"DISPLAY_NAME": {
"type": "string",
* "analyzer": "standard"*
},
....
}
}

*Query : *

/media/_search?pretty=&search_type=dfs_query_then_fetch&
preference=_primary

{
"from" : 0,
"size" : 100,
"explain" : true,
"query" : {

"filtered" : {
  "query" : {
     "multi_match": {
   "query": "happy",
   "fields": [ "DISPLAY_NAME" ]
}
  },
  "filter" : {
    "query" : {
      "bool" : {
      "must" : {
        "term" : {
          "CHANNEL_ID" : "1"
        }
      }
    }
    }
  }
}

}

}

*Result : *

"_shard": 4,
"_node": "xsGVhtTnThaG57_mJdMtxg",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score":* 6.614289*,
"_source": {
"DISPLAY_NAME": "Be Happy",
,
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 6485)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 6485,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description":
"fieldNorm(doc=6485)"
}
]
}
]
}

"_shard": 4,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "72253",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Happy Ways",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 1102)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 1102,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description":
"fieldNorm(doc=1102)"
}
]
}
]
}

"_shard":* 4*,
"_node": "UOjX2lxhR6mzfjHHmTm3cQ",
"_index": "jdbc_dev",
"_type": "media",
"_id": "127413",
"_score": 6.614289,
"_source": {
"DISPLAY_NAME": "Be Happy",
"_explanation": {
"value": 6.614289,
"description": "weight(DISPLAY_NAME:happy in 7277)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 6.614289,
"description": "fieldWeight in 7277,
product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with
freq of:",
"details": [
{
"value": 1,
"description":
"termFreq=1.0"
}
]
},
{
"value": 10.582862,
"description": "idf(docFreq=93,
maxDocs=1364306)"
},
{
"value": 0.625,
"description":
"fieldNorm(doc=7277)"
}
]
}
]
}

Notice that from 1,2,3 items the scores are the same 6.614289 even
though the DISPLAY_NAME is different

  1. Be Happy
  2. Happy Ways
  3. Be Happy

It looks like it doesn't take into consideration the number of
character/length when it compute the score. I remember somewhere in the
document indicate that by default the algorithm should give higher score to
the document that have shorter text on the searched field however this
doesn't seem like the case. Also i didn't manually disable the norm.

Any suggestion that i could circumvent this issue ?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCUB82B31DijLb9PNdrHmEzXP5JUWUepUp%3DDwSES9t%3DcQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.comhttps://groups.google.com/d/msgid/elasticsearch/589762DE-B343-470F-AC1D-C78119FCFB04%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC%3D2mqt0OsbWQj8vfrpV3wim7z2ozVcXuyw5Uk9Lm-org%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_C7UMU%3D3VmPdVoaKBOAwSa%2BwciKjajDm7prrJEDH7u7Q%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_C7UMU%3D3VmPdVoaKBOAwSa%2BwciKjajDm7prrJEDH7u7Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/RXuuSlkDSyA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCGjs7koyQAdr9A%3DZoiQsCeWpSNKce892uoun29ZbBi8Q%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCGjs7koyQAdr9A%3DZoiQsCeWpSNKce892uoun29ZbBi8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg-TX7F8KLCkbE1-1W6G_hJfeG7XSHorW%2B_6wkQtx8GKhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.