Multi-field search problems with minimum-should-match


(reza.ro) #1

Hi,

I am having issues in our search extension using ElasticSearch 2.3.4 with a multi-field query that searches a document title, description and some metadata at the same time. For example, when the user searches for a document with the title "inductive sensors" and having some meta-data(features) "300 Hz", she would pick "inductive sensors 300 hz" search-term and simplified query would look something like this:

GET _search
{
	"query": {
	  "bool": {
		"should": [
		  {
			"nested": {
			  "query": {
				"multi_match": {
				  "type": "most_fields",
				  "query": "inductive sensors 300 hz",
				  "minimum_should_match": "80%",
				  "operator": "or",
				  "fields": [
					"children.title_en.whitespaceAnalyzer^8",
					"children.title_en.getSynonymsAnalyzer^9",
				  ]
				}
			  },
			  "path": "children",
			  "boost": 1
			}
		  },
		  {
			"nested": {
			  "query": {
				"match": {
				  "children.attributes.attributeAnalyzer": {
					"query": "inductive sensors 300 hz",
					"minimum_should_match": "30%",
					"operator": "or",
					"boost": 1.0
				  }
				}
			  },
			  "path": "children",
			  "boost": 5.0
			}
		  }
		]
	  }
	}
}

I am looking for a solution to get all the documents with the title containing "inductive sensor" on the top and give priority to the ones that also have "300 hz" attribute and shows them above all.

The problem is that the multi-field query is performed on the all of the fields with the same search term but part of the search term is targeted for each field. Secondly, the value for minimum-should-match for title_en field is (80%) and since only 50% of the words in the search term are targeting the title_en field in this case there will be no match from this field.

On the other hand, the attribute sub-query with (30%) minimum-should-match matches and documents containing only "300 hz" attribute are returned. What might be a possible solution for this problem?

Reducing the value of minimum-should-match is not an option because if the users query only targets the title_en field then the precision(compared to recall) will not be acceptable.


(reza.ro) #2

ping


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.