The words in the document field are included in the query text with fuzzy logic

aleksander.zwd · October 21, 2021, 9:18pm

In Elasticsearch I have an index with documents containing a text type field "name" containing first name(s) and surname, e.g .:
Rebecca Donovan
Julian Fred Drake
Miley Angela Saunders
...

and a query with a text field, e.g.:
Mr. Julian second name Fred surname Drake living in Boston

Now I need to find documents in the index, but those for which all the words in the "name" field are contained in the text of the query, and the word comparisons must be fuzzy. The edit distance in fuzzy should be specified by a client in the query.

I would have a question, is it even possible with Elasticsearch?

I made a workaround solution, but it has some disadvantages. Query example for this solution assuming the maximum number of words in the list document is 3:

curl -H "Content-Type: application/json" -XPOST '127.0.0.1:9200/person-index/_search?pretty&size=10' -d '
{
	"query": {
		"bool": {
			"should": [
				{
					"bool": {
						"must": {
								"match": {
									"name": {
										"query": "Mr. Julian second name Fred surname Drake living in Boston",
										"minimum_should_match": 1,
										"fuzziness": 1
									}
								}
							},
						"filter": { "term" : {"name.length": 1} }
					}
				},
				{
					"bool": {
						"must": {
								"match": {
									"name": {
										"query": "Mr. Julian second name Fred surname Drake living in Boston",
										"minimum_should_match": 2,
										"fuzziness": 1
									}
								}
							},
						"filter": { "term" : {"name.length": 2} }
					}
				},
				{
					"bool": {
						"must": {
								"match": {
									"name": {
										"query": "Mr. Julian second name Fred surname Drake living in Boston",
										"minimum_should_match": 3,
										"fuzziness": 1
									}
								}
							},
						"filter": { "term" : {"name.length": 3} }
					}
				}
			],
			"minimum_should_match": 1
		}
	}
}
'

mappings:

{	
    "mappings": {
        "properties": {
		    "name": {
			    "type": "text",
				"analyzer": "whitespace",
				"fields": {
				    "length": { 
					    "type": "token_count",
					    "analyzer": "whitespace",
						"store": true
				    }
				}				
		    }			
        }
    }
}

The query is long and depends on the maximum number of words in the index documents.
Additionally, for the above example, the following query will meet the assumptions with the second record of the list (but it shouldn't):
Mr. Julian second name Jullan surname Drake living in Boston

I would have a question, is it possible to meet these requirements in a simpler and better way?

system · November 18, 2021, 9:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Find documents containing not more terms than in the query with fuzzines Elasticsearch	7	402	August 30, 2021
Search for index field value in text possible? Elasticsearch	2	334	July 18, 2020
Fuzziness with Keyword and Text fields Elasticsearch	2	719	August 5, 2020
Query for the name search Elasticsearch	1	508	July 6, 2017
Query that return similar words Elasticsearch	4	389	June 10, 2021

The words in the document field are included in the query text with fuzzy logic

Related topics