I have a Problem i have a es database with a huge amount of text and now i try to understand why one article is not found.
Here we have our ES-Indexsettings:
{
"stories": {
"aliases": {},
"mappings": {
"stories": {
"properties": {
"author": {
"type": "text",
"fields": {
"unstemmed": {
"type": "text",
"analyzer": "standard_unstemmed"
}
},
"analyzer": "standard_unstemmed"
},
"body": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
},
"issue_id": {
"type": "integer"
},
"issue_num": {
"type": "text"
},
"page": {
"type": "integer"
},
"pdf_name": {
"type": "text",
"analyzer": "standard"
},
"publication_id": {
"type": "integer"
},
"title": {
"type": "text",
"term_vector": "yes",
"analyzer": "standard"
},
"year": {
"type": "integer"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "stories",
"creation_date": "1651220881876",
"analysis": {
"filter": {
"german_stemmer": {
"name": "light_german",
"type": "stemmer"
},
"synonym_filter": {
"type": "synonym",
"synonyms_path": "/var/elasticsearch/synonyms/default"
}
},
"analyzer": {
"standard": {
"filter": [
"lowercase",
"german_stemmer",
"synonym_filter"
],
"tokenizer": "standard"
},
"standard_unstemmed": {
"filter": [
"lowercase"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "GwFIUXUVSp-XLZyPN6gx4w",
"version": {
"created": "6082099",
"upgraded": "6082399"
}
}
}
}
}
Now i have that one entry in ES where in the body its written like that:
"body": {
...
Zahnfehlbildungen dieses Medikament, was zu neuem Zahndurchbruch führte.\nWeiterführende Untersuchungen an Frettchen zeigten, dass\ndie Verabreichung des Medikaments zu einem zusätzlichen\nSchneidezahn führte. Da dieser neue Zahn zwischen den be-\n\nco\nm\n\nQuellen:\nZWP online \/ THE MAINICHI NEWSPAPERS\n\nPermadental verstärkt sein Team\nRainer Woyna verfügt über 25 Jahre Berufserfahrung in der Dentalbranche.\nDie Permadental GmbH als einer der führenden Anbieter\nvon Zahnersatz in Deutschland gehört zur international\nerfolgreichen Modern Dental Group. Produktionsstätten\nin Deutschland, den Niederlanden und Asien ermöglichen es, durch innovative
...
}
//In here its that phrase
\nZWP online \/ THE MAINICHI NEWSPAPERS\n\n **Permadental**
We are searching for the word Permadental
and in our match phrase:
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Permadental",
"type": "phrase_prefix",
"fields": [
"title^2",
"author.unstemmed",
"body.unstemmed",
"description"
]
}
}
It is not being found if I remove the unstemmed from body it is found. I sadly just took that code from a worker who leaved and i dont really understand what taht field keyword means in there. Because i dont really see if this is a filter or an analyzer also not found a resource in documentation about that.
Or if it belongs to the ignore_above entry on the keyword but i dont think that is the case because like i said the unstemmed removed and it works.
So just if u have any resource for clarification and dive deeper in that would be awesome.
Best Regards