I am searching over emails that each have attachments that I've ingested using the ingest attachment pipeline. Many emails have multiple attachments, and sometimes it's the content of the attachments themselves- the extracted text that satisfies the query.
My question is: how do I get the filename of the query that matches? I understand how to get the highlighted content of the matched field, but this is different than the filename.
The mapping for attachments i'm using are just the fields that the ingest-attachment plugin provides, using a foreach pipeline:
"attachments" : {
"properties" : {
"attachment" : {
"properties" : {
"author" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"content" : {
"type" : "text"
},
"content_length" : {
"type" : "long"
},
"content_type" : {
"type" : "keyword"
},
"date" : {
"type" : "date"
},
"language" : {
"type" : "keyword"
}
}
},
"data" : {
"type" : "object",
"enabled" : false
},
"filename" : {
"type" : "keyword"
}
}
}
currently i can get the highlighted content of the attachments like:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "_all",
"query": "\"linkedin\"",
"_name": "all fields"
}
}
]
}
},
"from": 0,
"size": 1,
"highlight": {
"fields": {
"attachments.filename": {},
"attachments.attachment.content": {}
},
"require_field_match": false
},
"_source": {
"excludes": [
"attachments.attachment.content",
"attachments.data"
]
}
}
but as stated above, this isnt what I need.
Thanks!