Help to understand match fields

libertey · November 10, 2023, 9:03am

I have a Problem i have a es database with a huge amount of text and now i try to understand why one article is not found.

Here we have our ES-Indexsettings:

{
	"stories": {
		"aliases": {},
		"mappings": {
			"stories": {
				"properties": {
					"author": {
						"type": "text",
						"fields": {
							"unstemmed": {
								"type": "text",
								"analyzer": "standard_unstemmed"
							}
						},
						"analyzer": "standard_unstemmed"
					},
					"body": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"description": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"id": {
						"type": "long"
					},
					"issue_id": {
						"type": "integer"
					},
					"issue_num": {
						"type": "text"
					},
					"page": {
						"type": "integer"
					},
					"pdf_name": {
						"type": "text",
						"analyzer": "standard"
					},
					"publication_id": {
						"type": "integer"
					},
					"title": {
						"type": "text",
						"term_vector": "yes",
						"analyzer": "standard"
					},
					"year": {
						"type": "integer"
					}
				}
			}
		},
		"settings": {
			"index": {
				"number_of_shards": "5",
				"provided_name": "stories",
				"creation_date": "1651220881876",
				"analysis": {
					"filter": {
						"german_stemmer": {
							"name": "light_german",
							"type": "stemmer"
						},
						"synonym_filter": {
							"type": "synonym",
							"synonyms_path": "/var/elasticsearch/synonyms/default"
						}
					},
					"analyzer": {
						"standard": {
							"filter": [
								"lowercase",
								"german_stemmer",
								"synonym_filter"
							],
							"tokenizer": "standard"
						},
						"standard_unstemmed": {
							"filter": [
								"lowercase"
							],
							"tokenizer": "standard"
						}
					}
				},
				"number_of_replicas": "1",
				"uuid": "GwFIUXUVSp-XLZyPN6gx4w",
				"version": {
					"created": "6082099",
					"upgraded": "6082399"
				}
			}
		}
	}
}

Now i have that one entry in ES where in the body its written like that:

"body": {
...

Zahnfehlbildungen dieses Medikament, was zu neuem Zahndurchbruch führte.\nWeiterführende Untersuchungen an Frettchen zeigten, dass\ndie Verabreichung des Medikaments zu einem zusätzlichen\nSchneidezahn führte. Da dieser neue Zahn zwischen den be-\n\nco\nm\n\nQuellen:\nZWP online \/ THE MAINICHI NEWSPAPERS\n\nPermadental  verstärkt sein Team\nRainer Woyna verfügt über 25 Jahre Berufserfahrung in der Dentalbranche.\nDie Permadental GmbH als einer der führenden Anbieter\nvon Zahnersatz in Deutschland gehört zur international\nerfolgreichen Modern Dental Group. Produktionsstätten\nin Deutschland, den Niederlanden und Asien ermöglichen es, durch innovative
...
}

//In here its that phrase

\nZWP online \/ THE MAINICHI NEWSPAPERS\n\n **Permadental**

We are searching for the word Permadental and in our match phrase:

	"query": {
		"bool": {
			"must": {
				"multi_match": {
					"query": "Permadental",
					"type": "phrase_prefix",
					"fields": [
						"title^2",
						"author.unstemmed",
						"body.unstemmed",
						"description"
					]
				}
			}

It is not being found if I remove the unstemmed from body it is found. I sadly just took that code from a worker who leaved and i dont really understand what taht field keyword means in there. Because i dont really see if this is a filter or an analyzer also not found a resource in documentation about that.

Or if it belongs to the ignore_above entry on the keyword but i dont think that is the case because like i said the unstemmed removed and it works.

So just if u have any resource for clarification and dive deeper in that would be awesome.

Best Regards

libertey · November 10, 2023, 9:48am

I think i know why i found no info about that inside the documentation because, its like a custom analyzer for that specific field, its not declared in the mapping properties, when i started to take over our search there was also title.basic and body.unstemmed, so might be just an mistake from the last developer and the search never worked for title and body atleast for that index.

RabBit_BR · November 10, 2023, 2:08pm

Hi @libertey

Analyzing your mapping only the "author" field has the "unstemmed" sub-fields and the "body" field does not. Maybe that's why the query works when you just use "body".

system · December 8, 2023, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Migrating from ES 5 to ES 8, problems in index Elasticsearch	5	402	February 23, 2023
Problem searching with ES Elasticsearch	3	294	July 6, 2017
Searching and Sorting on ElasticSearch Elasticsearch	1	365	July 6, 2017
Why i can't search my text Elasticsearch	2	357	February 3, 2022
Highlight term issue Elasticsearch	1	335	November 6, 2018

Help to understand match fields

Related topics