Results scoring


(Кирилл Шнуров) #1

I'm moving from Sphinx to ES and I can't make ES to score results properly.

Mapping:

  "properties" : {
    "address" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "street_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "street_analyzer_search"
    },
    "metro" : {
      "properties" : {
        "name_ru" : {
          "type" : "string",
          "boost" : 10.0,
          "analyzer" : "street_analyzer_index",
          "omit_term_freq_and_positions" : true,
          "search_quote_analyzer" : "street_analyzer_search"
        }
      }
    },
    "name_en" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "word_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "word_analyzer_search"
    },
    "name_ru" : {
      "type" : "string",
      "boost" : 20.0,
      "index_analyzer" : "word_analyzer_index",
      "search_analyzer" : "word_analyzer_search",
      "omit_term_freq_and_positions" : true
    }

Query (just some examples instead of the russian ones):

curl -X GET "http://localhost:9200/tomesto_places/place/_search?from=0&page=1&per_page=7&size=7&pretty=true" -d '{"query":{"custom_filters_score":{"query":{"dis_max":{"queries":[{"bool":{"should":[{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"address":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":10}}},{"text":{"metro.name_ru":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}}]}}]}}}}'

Results:

Great bar jackson st (score 43505656000.0 ??!!!)

awesome bar madelyn road (score 41732174200.0 ??!!!)

...

And nothing about "5th avenue"! I'm expecting items with name 'bar bla bla' and address '5th avenue 12' to be scored higher. Queries for just '5th avenue' gets a tiny score of ~4000. What am I doing wrong and how can I get proper results ranking? In same case Sphinx is working out of the box, ranking everything very well in extended match mode.

--


(simonw-2) #2

Hey man,

I don't think your query is doing what you think it is doing. you are
searching across 3 fields with 3 boolean queries "bar 5th avenue" including
fuzzyness and without a rewrite method. that means you are actually
searching with "constant_score" based on your boost. depending on how many
terms you match in your text query some scores will shoot out of the roof!
It's hard to tell why this score came up but if you could provide some
explain output I could tell you more
(http://www.elasticsearch.org/guide/reference/api/search/explain.html)

I think I can guess what you are trying to do (yet I'd likely approach it
differently) but as a short term advice I'd likely add a
"fuzzy_rewrite" : "scoring_boolean" next to your "fuzzieness" : 0.8 to
check if that helps. I also recommend you to look into "shingle filter" to
improve your "precision" as an additional query clause.

simon
On Wednesday, October 24, 2012 9:47:12 AM UTC+2, Кирилл Шнуров wrote:

I'm moving from Sphinx to ES and I can't make ES to score results properly.

Mapping:

  "properties" : {
    "address" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "street_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "street_analyzer_search"
    },
    "metro" : {
      "properties" : {
        "name_ru" : {
          "type" : "string",
          "boost" : 10.0,
          "analyzer" : "street_analyzer_index",
          "omit_term_freq_and_positions" : true,
          "search_quote_analyzer" : "street_analyzer_search"
        }
      }
    },
    "name_en" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "word_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "word_analyzer_search"
    },
    "name_ru" : {
      "type" : "string",
      "boost" : 20.0,
      "index_analyzer" : "word_analyzer_index",
      "search_analyzer" : "word_analyzer_search",
      "omit_term_freq_and_positions" : true
    }

Query (just some examples instead of the russian ones):

curl -X GET "http://localhost:9200/tomesto_places/place/_search?from=0&page=1&per_page=7&size=7&pretty=true" -d '{"query":{"custom_filters_score":{"query":{"dis_max":{"queries":[{"bool":{"should":[{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"address":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":10}}},{"text":{"metro.name_ru":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}}]}}]}}}}'

Results:

Great bar jackson st (score 43505656000.0 ??!!!)

awesome bar madelyn road (score 41732174200.0 ??!!!)

...

And nothing about "5th avenue"! I'm expecting items with name 'bar bla bla' and address '5th avenue 12' to be scored higher. Queries for just '5th avenue' gets a tiny score of ~4000. What am I doing wrong and how can I get proper results ranking? In same case Sphinx is working out of the box, ranking everything very well in extended match mode.

--


(Кирилл Шнуров) #3

Thank's for reply!
Actually I'm searching through 9 fields, I've truncated the example. Here's
the full query with explain: http://gist.github.com/3945005
It gets boosted because of fieldNorm:

     "value" : 7.5161928E9,
     "description" : "fieldNorm(field=fields.value, doc=122)"

Looks like I'm using ES the wrong way. I'm new with it, so I'll try to
describe my task:
When user types something like "*italian restaurant with wi-fi on 5th avenue
*", I want to find document with:
address (string) - 5th avenue
kitchens (array of strings) - italian
categories (array of strings) - restaurant
fields.name (string) - wi-fi (exists only when wi-fi = true)
"with" and "on" are already added to stopwords.

In Sphinx it was done by this (Rails&ThinkingSphinx): Place.search
'italian restaurant with wi-fi on 5th avenue', :match_mode => :extended

But Sphinx uses proximity_bm25 ranking mode. How can I achieve the same
result with ES? Searching with text '_all' gives no results at all.

среда, 24 октября 2012 г., 12:36:54 UTC+4 пользователь simonw написал:

Hey man,

I don't think your query is doing what you think it is doing. you are
searching across 3 fields with 3 boolean queries "bar 5th avenue" including
fuzzyness and without a rewrite method. that means you are actually
searching with "constant_score" based on your boost. depending on how many
terms you match in your text query some scores will shoot out of the roof!
It's hard to tell why this score came up but if you could provide some
explain output I could tell you more (
http://www.elasticsearch.org/guide/reference/api/search/explain.html)

I think I can guess what you are trying to do (yet I'd likely approach it
differently) but as a short term advice I'd likely add a
"fuzzy_rewrite" : "scoring_boolean" next to your "fuzzieness" : 0.8 to
check if that helps. I also recommend you to look into "shingle filter" to
improve your "precision" as an additional query clause.

simon
On Wednesday, October 24, 2012 9:47:12 AM UTC+2, Кирилл Шнуров wrote:

I'm moving from Sphinx to ES and I can't make ES to score results
properly.

Mapping:

  "properties" : {
    "address" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "street_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "street_analyzer_search"
    },
    "metro" : {
      "properties" : {
        "name_ru" : {
          "type" : "string",
          "boost" : 10.0,
          "analyzer" : "street_analyzer_index",
          "omit_term_freq_and_positions" : true,
          "search_quote_analyzer" : "street_analyzer_search"
        }
      }
    },
    "name_en" : {
      "type" : "string",
      "boost" : 20.0,
      "analyzer" : "word_analyzer_index",
      "omit_term_freq_and_positions" : true,
      "search_quote_analyzer" : "word_analyzer_search"
    },
    "name_ru" : {
      "type" : "string",
      "boost" : 20.0,
      "index_analyzer" : "word_analyzer_index",
      "search_analyzer" : "word_analyzer_search",
      "omit_term_freq_and_positions" : true
    }

Query (just some examples instead of the russian ones):

curl -X GET "http://localhost:9200/tomesto_places/place/_search?from=0&page=1&per_page=7&size=7&pretty=true" -d '{"query":{"custom_filters_score":{"query":{"dis_max":{"queries":[{"bool":{"should":[{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"name_en":{"query":"bar 5th avenue","analyzer":"word_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}},{"text":{"address":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":10}}},{"text":{"metro.name_ru":{"query":"bar 5th avenue","analyzer":"street_analyzer_search","fuzziness":0.8,"operator":"or","boost":20}}}]}}]}}}}'

Results:

Great bar jackson st (score 43505656000.0 ??!!!)

awesome bar madelyn road (score 41732174200.0 ??!!!)

...

And nothing about "5th avenue"! I'm expecting items with name 'bar bla bla' and address '5th avenue 12' to be scored higher. Queries for just '5th avenue' gets a tiny score of ~4000. What am I doing wrong and how can I get proper results ranking? In same case Sphinx is working out of the box, ranking everything very well in extended match mode.

--


(system) #4