I have read everything I could find but I have not been able to get the ingest attachment plugin to work. I am using the mapping below:
{
"settings": {
"analysis": {
"analyzer": {
"lower_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase"
]
}
},
"char_filter":{
"punc_filter":{
"type":"mapping",
"mappings":[
"- => ",
", => ",
"( => ",
") => ",
"' => ",
"[ => ",
"] => ",
"\" => ",
"$ => ",
"& => ",
": => ",
"; => ",
". => ",
"* => ",
"= => ",
"+ => ",
"^ => ",
"% => ",
"# => ",
"@ => ",
"! => ",
"~ => ",
"? => "
]
}
},
"normalizer":{
"sortnormalizer":{
"type":"custom",
"char_filter":["punc_filter"],
"filter": ["lowercase"]
}
}
}
},
"mappings":{
"my_document":{
"_all":{"enabled":false},
"properties": {
"collection_identifier":
{
"type":"keyword"
},
"document_file_name":
{
"type":"text",
"analyzer":"lower_keyword_analyzer"
},
"document_type":
{
"type":"keyword"
},
"document_title":
{
"type":"text",
"analyzer":"lower_keyword_analyzer",
"fields":{
"sort":{
"type":"keyword",
"normalizer": "sortnormalizer"
},
"case_sensitive":{
"type":"keyword"
}
}
},
"attachment":{
"properties":{
"content": {"type":"text","store": true}
}
},
"document_categories":
{
"type":"text",
"analyzer":"lower_keyword_analyzer"
},
"document_tags":
{
"type":"text",
"analyzer":"standard"
},
"document_created_at":
{
"type":"date",
"format":"epoch_millis"
},
"document_update_at":
{
"type":"date",
"format":"epoch_millis"
},
"data":{
"type":"text"
}
}
}
}
}
I have configured the pipeline using:
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars": -1,
"properties":["content"]
}
}
]
}
I am using the bulk api. I am using NodeJS. After indexing (no errors returned) and I do a search (GET - /_search), the results include the base64 encoded data but there is no attachment field in the results. I must be missing something because no one else seems to be having this issue. Any ideas?