Multi-Fields search using Span Queries with fuzziness in Elasticsearch


(Nikesh) #1

Hi all,
I am using Span queries to enable match phrase with fuzziness. I am able to do this on single field but since i am using fuzzy query with span_multi query, I am failing it to use this query for multiple fields based search. Is there a way that i can overcome this issue?

{
      "from": 0,
      "size": 10,
      
     "query": {
            "span_near" : {
                "clauses" : [
                   { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "claims"
                                                      }
                                            }
                                      }
                               }
                   },
                     { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "novation"
                                                      }
                                            }
                                      }
                               }
                   }
                   
                ],
                "slop" : 2,
                "in_order" : true
            }
        },
      "highlight": {
        "type": "fvh",
        "fields": {
          "*": {}
        }
      }
    }

(Mark Harwood) #2

I can think of a couple of options:

  • Query-time: use a bool query with a should array containing versions of your span query written for each field
  • Index-time: use copy_to in your mapping to combine the content of multiple fields into a single searchable field and then search just that

(Nikesh) #3

Hi, thanks for your time and response
I have about 150 fields and thousands of documents, so i don't think Index time operation would not be feasible due to the large index size.
can you please provide a link or an example where bool and should array is used with span queries or either the format of its usage?


(Mark Harwood) #4

The bool query provides the building blocks for assembling combinations of all queries (span included) using Boolean logic. You want an OR (span match on field X OR field Y) so you need boolean logic to assemble that expression.


(Nikesh) #5

By what I understood, I have to use span query under bool query. Where do i mention all fields in this query? since span query doesn't support fields operator

  {
      "from": 0,
      "size": 10,
      
     "query": {
     	"bool": {
     		"should":{
     		  "span_near" : {
                "clauses" : [
                   { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "claims"
                                                      }
                                            }
                                      }
                               }
                   },
                     { 
                   	"span_multi": {
                           "match": {
                                 "fuzzy": {
                                    "TITLE": {
                                           "fuzziness": "2",
                                            "value": "novation"
                                                      }
                                            }
                                      }
                               }
                   }
                   
                ],
                "slop" : 2,
                "in_order" : true
            }
     		}
     	}
          
        },
      "highlight": {
        "type": "fvh",
        "fields": {
          "*": {}
        }
      }
    }

(Mark Harwood) #6

You have to repeat yourself. Multiple span query objects, each with a different field name (to provide the relevant context) but containing the same search terms ie


(Nikesh) #7

I am looking for a query similar to this using span queries. Since i have about 150 fields, I don't feel it right to have 150 span query objects. Is there a way to mention "fields" operator like in the below example?

 "should": [
        {
          "multi_match": {
            "query": "there",
           "fields": [
            ]
          }
        }
]

(Mark Harwood) #8

No, sorry. Span is used on text fields and generally an index has many structured fields but only one or two unstructured text fields to capture what can't be expressed in structured data.


(Nikesh) #9

Thanks for your honest response.
Is there a another method where i can use fuzziness with match_phrase query?


(Mark Harwood) #10

You perhaps don't need a phrase/span query that supports fuzzy.
A common strategy is to have a big bool query with a should array filled with different forms of running the same user input, ranging from the sloppy "any word plus fuzzy" to the very strict e.g. ANDed terms or exact phrase matches. Docs which satisfy more of the given clauses will naturally rank higher.


(Nikesh) #11

I did follow the same strategy. But it is taking a lot of time to process for a single document.

{
  "from": 0,
  "size": 24,
  "query": {
    "bool": {
      
      "should": [
        {
          "multi_match": {
            "query": "current",
            "type": "best_fields",
            "fields": []
          }
        },
        {
          "query_string": {
            "query": "*current*",
            "fields": []
          }
        },
        {
          "multi_match": {
            "query": "current",
            "fuzziness": "1",
            "fields": []
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "fvh",
    "fields": {
      "*": {}
    }
  }
}

I did post an query about this : Highlighting with fvh taking long time in Elasticsearch


(Mark Harwood) #12

Try taking away request elements until you find the culprit. Wildcards and highlighting are both potential performance hogs. You could also try the unified highlighter type. "Fast Vector Highlighter" isn't always as fast as the name suggests.


(Nikesh) #13

My project demands fuzziness, wildcard, highlighting. Indeed unified is faster than fvh. Will unified highlighting be deprecated in near future? Unified takes around 4571ms for a single document. Is it because of the edge gram analyzer that i have used?


(Mark Harwood) #14

This is straying from the original topic. In the interests of keeping things focused I suggest opening another issue to concentrate on questions around highlighter performance. It would help to do some investigation with your data and settings first eg the effects of ngram sizes/numbers of fields otherwise you'll just be waiting for someone to ask your for that additional information before being able to offer a diagnosis.


(Nikesh) #15

Thank you


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.