Giving Fields Different Weights in a Multi-Match Query Using Cross-Fields Technique

maryam_abdullah · January 15, 2019, 8:47am

Hello,

I hope someone can help me.

I was wondering if it's possible to use the cross-fields technique and still give different weights to the fields, cause based on what I've read, it's not possible.

What do you think of filtering documents using the cross-fields technique, then score documents based on the best fields technique?

Thank you.

Mark_Harwood · January 15, 2019, 9:37am

The intention of cross-fields is that the search engine decides what the right field for each search term is.
Any attempt by you to globally favour certain fields undoes the subtle scoring tweaks made to bias each individual search term towards the right field.

If you do choose to manually boost fields then what you're really using cross-fields logic for is to cap the IDF score for each term to the most-likely field for a term. That should help prevent avoid the tendencies of IDF to favour the most bizarre choices of field for a term (eg ranking surname:john very highly). Manual field boosting however is still arguably interfering with careful per-word boosting of the terms provided by overriding with a ham-fisted global approach that says "I always like field X over Y"

maryam_abdullah · January 15, 2019, 9:54am

So, what should I do in that case?

I can't let all fields have the same weight, 'cause some are way more important than others.

here's my code:

 var response = elasticClient.Search<Document>(s => s
                .From(offset)
                .Size(size)
                .Index(name)
                .Sort(so => sortByDate
                    ? so.Descending(a => a.Date).Field(f => f.Field("_score").Order(SortOrder.Descending))
                    : so.Field(f => f.Field("_score").Order(SortOrder.Descending)))
                .Query(q => q
                    .FunctionScore(fs => fs.Query(qy => qy
                    .Bool(b => b
                        .Must(m => m
                            .Term(tm => tm.Field(fd => fd.Client)
                                .Value(client)
                            )
                        )
                        .MustNot(mn => mn

                            .Terms(t => t.Field(f => f.Client).Terms(other)))

                        .Should(m => m
                            .QueryString(ss => ss.Query("\"" + query.Data + "\"").QuoteFieldSuffix(".exact").Fields(fd => fd
                                           .Field(f => f.Title,  6)
                                           .Field(f => f.Summary, 6)
                                           .Field(f => f.Content, 1)
                                        .Field(f => f.Tags, 2
                                        .Field(f => f.Meta, 20)
                                        .Field(f => f.Relations, 20)
                                        .Field(f => f.Type, 3)
                                        .Field(f => f.MainTerms, 3)
                                        .Field(f => f.Terms, 3)
                                        .Field(f => f.RelationsSummary, 10)
                                        )),
                             m => m
                            .MultiMatch(mm => mm
                                    .Fields(fd => fd
                                     .Field(f => f.Title, 5)
                                        .Field(f => f.Content, 0.5)
                                        .Field(f => f.Tags, 0.5)
                                        .Field(f => f.RelationsSummary, 0.5)
                                        .Field(f => f.Meta, 1)
                                    .Field(f => f.Relations, 1)
                                    .Field(f => f.Summary, 1)
                                    )
                                    .Type(TextQueryType.BestFields)
                                    .Operator(Operator.Or)
                                    .Query(query.Data)
                                    .TieBreaker(1)
                                    )
						
                            , m => m
                            .MultiMatch(mm => mm
                              .Fields(fd => fd
                                  .Field(f => f.MainTerms)
                                  .Field(f => f.Tags)
                                  )
                                  .Type(TextQueryType.CrossFields)
                                  .Operator(Operator.And)
                                  .Query(query.Data)
                                  .Boost(13))
                      )
                      .Filter(fr => fr
                      .MultiMatch(mm => mm
                                    .Fields(fd => fd
                                        .Field(f => f.Title)
                                    .Field(f => f.Summary)
                                    .Field(f => f.Content)
                                    .Field(f => f.Tags)
                                    .Field(f => f.Meta)
                                    .Field(f => f.Relations)
                                    .Field(f => f.Type)
                                    .Field(f => f.MainTerms)
                                    .Field(f => f.Terms)

                                    )
                                    .Type(TextQueryType.CrossFields)
                                    .Operator(Operator.Or)
                                    .Query(query.Data)
                                    .MinimumShouldMatch("80%")
                                    )
                                    )
                    )
                    
                    
                )
            )
            )
            );

Mark_Harwood · January 15, 2019, 10:05am

Cross fields makes the most difference when there's typically a right and a wrong context for search terms.
An example would be that the query John Smith contains terms that are targeted at the first_name and last_name fields and the naturally high IDF score for last_name:john needs dampening, Cross fields will adjust IDF to favour what seems the most natural context for each term.

In your given example you have fields title, summary and content. Unlike first_name and last_name these are probably just different-length forms of the same types of content so I imagine search terms appear equally commonly in each. Longer texts may contain more instances of words that argue a case (e.g. "however, ") but that shouldn't be a factor in scoring the typical search.

Manual boosting that favours shorter. information-rich fields like title probably makes sense.

maryam_abdullah · January 15, 2019, 10:24am

I see, but it's also possible (maybe rare) to have for instance the first name in the title and the last name in the summary/content.
For this reason, filtering at first based on the cross-fields technique seemed like the right idea. After that, I score documents based on the best fields technique 'cause some fields are more important than others.

maryam_abdullah · January 15, 2019, 12:22pm

I was also wondering how I am supposed to add the common terms functionality to my code.

system · February 12, 2019, 12:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Surprising scoring when using multi_match's cross_field Elasticsearch	5	417	July 6, 2017
Cross_fields and boost - Is it feasible? Elasticsearch	15	1577	July 5, 2018
Is it Possible to Boost Specific Fields Using Cross Fields? Elasticsearch	1	352	December 21, 2018
Using constant_score combined with multi_match cross_field Elasticsearch	2	1794	April 23, 2018
Search multi-indices(having some same field name), how to search cross indices Elasticsearch	2	428	October 7, 2018

Giving Fields Different Weights in a Multi-Match Query Using Cross-Fields Technique

Related topics