[Theory] Improving search result relevance?

My question is still open.
What is the most general solution to this?

I tried to query with use_dis_max, but it doen't change a lot.
I would try to set threshold on score, but every single occurence of number
5 in index have a score roughly the same as the most relevant results.

On Tuesday, January 29, 2013 10:19:33 AM UTC+6, Rauan Maemirov wrote:

Hi, Clinton.

I tried, but i still keep getting any occurences of 5.

Anu other suggestions? I already use query_string fields boosting like "fields":
["title^2", "tags^2", "description"]

On Monday, January 28, 2013 9:58:27 PM UTC+6, Clinton Gormley wrote:

On Sun, 2013-01-27 at 20:17 -0800, Rauan Maemirov wrote:

Hi, all. I'm having a little bit different problem, but I guess in
essence it's the same.

I have an index with items and trying to search by title 'iphone 5'.
I can get well sorted items 'iphone 5' and then all other 'iphone 3g',
'iphone 4s', etc.

Now my problem is that there's also 'Loreal Elseve 5' in search
results, i.e. elastic including in search results all entries with
number 5 (and the score is pretty high). How could I solve it?

You could try setting minimum_should_match to eg "60%"

clint

I don't want to filter out all numbers at indexing phase, because
they're very useful in such a case when I search for keyword followed
by number or version.

On Wednesday, November 28, 2012 9:51:56 AM UTC+6, Zachary Tong wrote:
I'm curious about some practical tips to improve search result
relevance. Currently, I'm tokenizing my fields with shingles
and performing a simple "text" search on the shingled field.
I've found this gives better results than other things I've
tried (combinations of: terms, n-grams, phrase, shingles).
However, search results leave something to be desired. I
imagine there are ways to fix this...I just don't know how.

    For example, if I search for "Servo Gear", it will match all 
    documents with either "Servo" or "Gear" and order them roughly 
    based on frequency.  There is some preference to documents 
    that say "Servo Gear" explicitly, but often a document that 
    lists "Gear" four times will rank higher simply because it has 
    the term more frequently.  Ideally, something that matches the 
    phrase would rank higher. 
    
    
    So, how should I attack this problem?  I'm thinking something 
    like this: 
          * Analyzers 
                  * Regular term tokenizer 
                  * Shingles, but turn off unigrams 
          * Search both terms and shingles, but boost shingles so 
            that phrase matches are sorted higher 
          * Perhaps search using span_near so that non-exact 
            phrases can be matched too?  Would it be better to do 
            something like a phrase query with slop instead? 
    Does that make sense?  I understand ES well enough from a 
    technical point of view, but I'm having a hard time 
    implementing more subtle search algorithms that can surface 
    the correct documents. 
    
    
    Thanks! 
    -Zach 

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.