Giving Fields Different Weights in a Multi-Match Query Using Cross-Fields Technique


(Maryam Abdullah) #1

Hello,

I hope someone can help me.

I was wondering if it's possible to use the cross-fields technique and still give different weights to the fields, cause based on what I've read, it's not possible.

What do you think of filtering documents using the cross-fields technique, then score documents based on the best fields technique?

Thank you.


(Mark Harwood) #2

The intention of cross-fields is that the search engine decides what the right field for each search term is.
Any attempt by you to globally favour certain fields undoes the subtle scoring tweaks made to bias each individual search term towards the right field.

If you do choose to manually boost fields then what you're really using cross-fields logic for is to cap the IDF score for each term to the most-likely field for a term. That should help prevent avoid the tendencies of IDF to favour the most bizarre choices of field for a term (eg ranking surname:john very highly). Manual field boosting however is still arguably interfering with careful per-word boosting of the terms provided by overriding with a ham-fisted global approach that says "I always like field X over Y"


(Maryam Abdullah) #3

So, what should I do in that case?

I can't let all fields have the same weight, 'cause some are way more important than others.

here's my code:

 var response = elasticClient.Search<Document>(s => s
                .From(offset)
                .Size(size)
                .Index(name)
                .Sort(so => sortByDate
                    ? so.Descending(a => a.Date).Field(f => f.Field("_score").Order(SortOrder.Descending))
                    : so.Field(f => f.Field("_score").Order(SortOrder.Descending)))
                .Query(q => q
                    .FunctionScore(fs => fs.Query(qy => qy
                    .Bool(b => b
                        .Must(m => m
                            .Term(tm => tm.Field(fd => fd.Client)
                                .Value(client)
                            )
                        )
                        .MustNot(mn => mn

                            .Terms(t => t.Field(f => f.Client).Terms(other)))

                        .Should(m => m
                            .QueryString(ss => ss.Query("\"" + query.Data + "\"").QuoteFieldSuffix(".exact").Fields(fd => fd
                                           .Field(f => f.Title,  6)
                                           .Field(f => f.Summary, 6)
                                           .Field(f => f.Content, 1)
                                        .Field(f => f.Tags, 2
                                        .Field(f => f.Meta, 20)
                                        .Field(f => f.Relations, 20)
                                        .Field(f => f.Type, 3)
                                        .Field(f => f.MainTerms, 3)
                                        .Field(f => f.Terms, 3)
                                        .Field(f => f.RelationsSummary, 10)
                                        )),
                             m => m
                            .MultiMatch(mm => mm
                                    .Fields(fd => fd
                                     .Field(f => f.Title, 5)
                                        .Field(f => f.Content, 0.5)
                                        .Field(f => f.Tags, 0.5)
                                        .Field(f => f.RelationsSummary, 0.5)
                                        .Field(f => f.Meta, 1)
                                    .Field(f => f.Relations, 1)
                                    .Field(f => f.Summary, 1)
                                    )
                                    .Type(TextQueryType.BestFields)
                                    .Operator(Operator.Or)
                                    .Query(query.Data)
                                    .TieBreaker(1)
                                    )
						
                            , m => m
                            .MultiMatch(mm => mm
                              .Fields(fd => fd
                                  .Field(f => f.MainTerms)
                                  .Field(f => f.Tags)
                                  )
                                  .Type(TextQueryType.CrossFields)
                                  .Operator(Operator.And)
                                  .Query(query.Data)
                                  .Boost(13))
                      )
                      .Filter(fr => fr
                      .MultiMatch(mm => mm
                                    .Fields(fd => fd
                                        .Field(f => f.Title)
                                    .Field(f => f.Summary)
                                    .Field(f => f.Content)
                                    .Field(f => f.Tags)
                                    .Field(f => f.Meta)
                                    .Field(f => f.Relations)
                                    .Field(f => f.Type)
                                    .Field(f => f.MainTerms)
                                    .Field(f => f.Terms)

                                    )
                                    .Type(TextQueryType.CrossFields)
                                    .Operator(Operator.Or)
                                    .Query(query.Data)
                                    .MinimumShouldMatch("80%")
                                    )
                                    )
                    )
                    
                    
                )
            )
            )
            );

(Mark Harwood) #4

Cross fields makes the most difference when there's typically a right and a wrong context for search terms.
An example would be that the query John Smith contains terms that are targeted at the first_name and last_name fields and the naturally high IDF score for last_name:john needs dampening, Cross fields will adjust IDF to favour what seems the most natural context for each term.

In your given example you have fields title, summary and content. Unlike first_name and last_name these are probably just different-length forms of the same types of content so I imagine search terms appear equally commonly in each. Longer texts may contain more instances of words that argue a case (e.g. "however, ") but that shouldn't be a factor in scoring the typical search.

Manual boosting that favours shorter. information-rich fields like title probably makes sense.


(Maryam Abdullah) #5

I see, but it's also possible (maybe rare) to have for instance the first name in the title and the last name in the summary/content.
For this reason, filtering at first based on the cross-fields technique seemed like the right idea. After that, I score documents based on the best fields technique 'cause some fields are more important than others.


(Maryam Abdullah) #6

I was also wondering how I am supposed to add the common terms functionality to my code.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.