Help to understand match fields

I have a Problem i have a es database with a huge amount of text and now i try to understand why one article is not found.

Here we have our ES-Indexsettings:

{
	"stories": {
		"aliases": {},
		"mappings": {
			"stories": {
				"properties": {
					"author": {
						"type": "text",
						"fields": {
							"unstemmed": {
								"type": "text",
								"analyzer": "standard_unstemmed"
							}
						},
						"analyzer": "standard_unstemmed"
					},
					"body": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"description": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"id": {
						"type": "long"
					},
					"issue_id": {
						"type": "integer"
					},
					"issue_num": {
						"type": "text"
					},
					"page": {
						"type": "integer"
					},
					"pdf_name": {
						"type": "text",
						"analyzer": "standard"
					},
					"publication_id": {
						"type": "integer"
					},
					"title": {
						"type": "text",
						"term_vector": "yes",
						"analyzer": "standard"
					},
					"year": {
						"type": "integer"
					}
				}
			}
		},
		"settings": {
			"index": {
				"number_of_shards": "5",
				"provided_name": "stories",
				"creation_date": "1651220881876",
				"analysis": {
					"filter": {
						"german_stemmer": {
							"name": "light_german",
							"type": "stemmer"
						},
						"synonym_filter": {
							"type": "synonym",
							"synonyms_path": "/var/elasticsearch/synonyms/default"
						}
					},
					"analyzer": {
						"standard": {
							"filter": [
								"lowercase",
								"german_stemmer",
								"synonym_filter"
							],
							"tokenizer": "standard"
						},
						"standard_unstemmed": {
							"filter": [
								"lowercase"
							],
							"tokenizer": "standard"
						}
					}
				},
				"number_of_replicas": "1",
				"uuid": "GwFIUXUVSp-XLZyPN6gx4w",
				"version": {
					"created": "6082099",
					"upgraded": "6082399"
				}
			}
		}
	}
}

Now i have that one entry in ES where in the body its written like that:

"body": {
...

Zahnfehlbildungen dieses Medikament, was zu neuem Zahndurchbruch führte.\nWeiterführende Untersuchungen an Frettchen zeigten, dass\ndie Verabreichung des Medikaments zu einem zusätzlichen\nSchneidezahn führte. Da dieser neue Zahn zwischen den be-\n\nco\nm\n\nQuellen:\nZWP online \/ THE MAINICHI NEWSPAPERS\n\nPermadental  verstärkt sein Team\nRainer Woyna verfügt über 25 Jahre Berufserfahrung in der Dentalbranche.\nDie Permadental GmbH als einer der führenden Anbieter\nvon Zahnersatz in Deutschland gehört zur international\nerfolgreichen Modern Dental Group. Produktionsstätten\nin Deutschland, den Niederlanden und Asien ermöglichen es, durch innovative
...
}

//In here its that phrase

\nZWP online \/ THE MAINICHI NEWSPAPERS\n\n **Permadental**

We are searching for the word Permadental and in our match phrase:

	"query": {
		"bool": {
			"must": {
				"multi_match": {
					"query": "Permadental",
					"type": "phrase_prefix",
					"fields": [
						"title^2",
						"author.unstemmed",
						"body.unstemmed",
						"description"
					]
				}
			}

It is not being found if I remove the unstemmed from body it is found. I sadly just took that code from a worker who leaved and i dont really understand what taht field keyword means in there. Because i dont really see if this is a filter or an analyzer also not found a resource in documentation about that.

Or if it belongs to the ignore_above entry on the keyword but i dont think that is the case because like i said the unstemmed removed and it works.

So just if u have any resource for clarification and dive deeper in that would be awesome.

Best Regards

I think i know why i found no info about that inside the documentation because, its like a custom analyzer for that specific field, its not declared in the mapping properties, when i started to take over our search there was also title.basic and body.unstemmed, so might be just an mistake from the last developer and the search never worked for title and body atleast for that index.

Hi @libertey

Analyzing your mapping only the "author" field has the "unstemmed" sub-fields and the "body" field does not. Maybe that's why the query works when you just use "body".

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.