Multi-Fields search using Span Queries with fuzziness in Elasticsearch

Nikesh · October 16, 2018, 7:34am

Hi all,
I am using Span queries to enable match phrase with fuzziness. I am able to do this on single field but since i am using fuzzy query with span_multi query, I am failing it to use this query for multiple fields based search. Is there a way that i can overcome this issue?

{
      "from": 0,
      "size": 10,
      
     "query": {
            "span_near" : {
                "clauses" : [
                   { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "claims"
                                                      }
                                            }
                                      }
                               }
                   },
                     { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "novation"
                                                      }
                                            }
                                      }
                               }
                   }
                   
                ],
                "slop" : 2,
                "in_order" : true
            }
        },
      "highlight": {
        "type": "fvh",
        "fields": {
          "*": {}
        }
      }
    }

Mark_Harwood · October 16, 2018, 8:39am

I can think of a couple of options:

Query-time: use a bool query with a should array containing versions of your span query written for each field
Index-time: use copy_to in your mapping to combine the content of multiple fields into a single searchable field and then search just that

Nikesh · October 16, 2018, 8:51am

Hi, thanks for your time and response
I have about 150 fields and thousands of documents, so i don't think Index time operation would not be feasible due to the large index size.
can you please provide a link or an example where bool and should array is used with span queries or either the format of its usage?

Mark_Harwood · October 16, 2018, 8:55am

The bool query provides the building blocks for assembling combinations of all queries (span included) using Boolean logic. You want an OR (span match on field X OR field Y) so you need boolean logic to assemble that expression.

Nikesh · October 16, 2018, 9:00am

By what I understood, I have to use span query under bool query. Where do i mention all fields in this query? since span query doesn't support fields operator

  {
      "from": 0,
      "size": 10,
      
     "query": {
     	"bool": {
     		"should":{
     		  "span_near" : {
                "clauses" : [
                   { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "claims"
                                                      }
                                            }
                                      }
                               }
                   },
                     { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "novation"
                                                      }
                                            }
                                      }
                               }
                   }
                   
                ],
                "slop" : 2,
                "in_order" : true
            }
     		}
     	}
          
        },
      "highlight": {
        "type": "fvh",
        "fields": {
          "*": {}
        }
      }
    }

Mark_Harwood · October 16, 2018, 9:03am

You have to repeat yourself. Multiple span query objects, each with a different field name (to provide the relevant context) but containing the same search terms ie

Nikesh · October 16, 2018, 9:13am

I am looking for a query similar to this using span queries. Since i have about 150 fields, I don't feel it right to have 150 span query objects. Is there a way to mention "fields" operator like in the below example?

 "should": [
        {
          "multi_match": {
            "query": "there",
           "fields": [
            ]
          }
        }
]

Mark_Harwood · October 16, 2018, 9:14am

No, sorry. Span is used on text fields and generally an index has many structured fields but only one or two unstructured text fields to capture what can't be expressed in structured data.

Nikesh · October 16, 2018, 9:18am

Thanks for your honest response.
Is there a another method where i can use fuzziness with match_phrase query?

Mark_Harwood · October 16, 2018, 9:27am

You perhaps don't need a phrase/span query that supports fuzzy.
A common strategy is to have a big bool query with a should array filled with different forms of running the same user input, ranging from the sloppy "any word plus fuzzy" to the very strict e.g. ANDed terms or exact phrase matches. Docs which satisfy more of the given clauses will naturally rank higher.

Nikesh · October 16, 2018, 10:30am

I did follow the same strategy. But it is taking a lot of time to process for a single document.

{
  "from": 0,
  "size": 24,
  "query": {
    "bool": {
      
      "should": [
        {
          "multi_match": {
            "query": "current",
            "type": "best_fields",
            "fields": []
          }
        },
        {
          "query_string": {
            "query": "*current*",
            "fields": []
          }
        },
        {
          "multi_match": {
            "query": "current",
            "fuzziness": "1",
            "fields": []
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "fvh",
    "fields": {
      "*": {}
    }
  }
}

I did post an query about this : Highlighting with fvh taking long time in Elasticsearch

Mark_Harwood · October 16, 2018, 10:34am

Try taking away request elements until you find the culprit. Wildcards and highlighting are both potential performance hogs. You could also try the unified highlighter type. "Fast Vector Highlighter" isn't always as fast as the name suggests.

Nikesh · October 16, 2018, 10:59am

My project demands fuzziness, wildcard, highlighting. Indeed unified is faster than fvh. Will unified highlighting be deprecated in near future? Unified takes around 4571ms for a single document. Is it because of the edge gram analyzer that i have used?

Mark_Harwood · October 16, 2018, 11:06am

This is straying from the original topic. In the interests of keeping things focused I suggest opening another issue to concentrate on questions around highlighter performance. It would help to do some investigation with your data and settings first eg the effects of ngram sizes/numbers of fields otherwise you'll just be waiting for someone to ask your for that additional information before being able to offer a diagnosis.

Nikesh · October 16, 2018, 11:23am

Thank you

system · November 13, 2018, 11:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Combining phrase match and fuzziness for search in multi match query Elasticsearch	1	426	July 10, 2019
Multi_search with fuzziness Elasticsearch	1	132	September 11, 2023
Query performance with span queries Elastic Search	1	122	June 25, 2024
Using fuzziness in span queries Elasticsearch language-clients	2	460	February 28, 2020
Phrase search with fuzziness Elasticsearch	1	136	November 26, 2022

Multi-Fields search using Span Queries with fuzziness in Elasticsearch

Related topics