Simple query on two fields with preference to one of them


#1

I have a document with two fields; "title" and "description".

When performing query, I want to give a significant preference to documents having search term in their "title", e.g. show all documents with term in "title" and only after then show documents with term in "description". In other words, documents with term in "title" shall score much much higher than documents with term in "description".

I using SimpleQueryString since I allow to provide expressions as search terms.

I was trying to provide different boost values to "title" and "description" fields (e.g. 100 vs 1) and I was trying to use DisMax with these Boost values for each of the queries - it doesn't seems to work. It looks like boost values simply ignored.

Is there any way to achieve what I looking for with SimpleQueryString?

I using ES 6.4 with NEST (C#)


(Simon Willnauer) #2

your approach seems sane. Can you post the queries you are sending. Preferably the json string that get's send to elasticsearch.


#3

Hi Simon,

Here is what I do (NEST, unfortunately I don't have it as JSON):

string criteria = "KuKu";

Func<QueryContainerDescriptor, QueryContainer> func = q =>
{
QueryContainer query = q.SimpleQueryString(qs => qs.Query(criteria)
.Fields(f => f
.Field(fi => fi.Description)
.Field(fi=> fi.Title, 100)
.DefaultOperator(Operator.And));

return query ;
}

I don't see that received search results favor in some way "Title". Instead, many of first results (e.g. results with highest score) are based on "Description" field. I see that results with "KuKu" in "Description" appear before results with "KuKu" in "Title".

I would expect that all results with "KuKu" in "Title" will appear before any of results with "KuKu" in "Description", since I gave "Title" much higher Boost value.

Is there anything I doing wrong?


(Simon Willnauer) #4

I suspect that this is an effect of document frequency. For instance if you have a firstname and surname field and you have a doc like this{ "firstname" : "Paul", "surname" : "Simon"} in the index you will very likely see it some up if you search for simon on both fields. The reason is that simon is a rare term in the surname field but not in the firstname. you can try to prevent this by changing the type to type: cross_fields see this for details. If that doesn't help, I think you should provide a response from the explain API


#5

Indeed it looks much better now, using MultiMatch with CrossFields.

However, in order to be able to use CrossFields, I had to give up on SimpleQueryString. This is very unfortunate, since it provides very important for me ability to parse query expressions.

Is there any way to keep them both?


(Simon Willnauer) #6

you can set type on SimpleQueryString as well. it should have the same effect. Another way is to use copy_to in your mapping and copy both fields in an uber field and search just in that one field for scoring. That is common practice.


#7

I can't find anywhere in SimpleQueryString's api anything related to type (I using NEST). Can you please point me to the relevant method?

Thank you for your help