Ingest attachment - missing attachment field in results


(Shawn Mullen-2) #1

I have read everything I could find but I have not been able to get the ingest attachment plugin to work. I am using the mapping below:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "lower_keyword_analyzer": {
          "type":      "custom",
          "tokenizer": "keyword",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase"
          ]
        }
      },
      "char_filter":{
        "punc_filter":{
          "type":"mapping",
          "mappings":[
            "- => ",
            ", => ",
            "( => ",
            ") => ",
            "' => ",
            "[ => ",
            "] => ",
            "\" => ",
            "$ => ",
            "& => ",
            ": => ",
            "; => ",
            ". => ",
            "* => ",
            "= => ",
            "+ => ",
            "^ => ",
            "% => ",
            "# => ",
            "@ => ",
            "! => ",
            "~ => ",
            "? => "
            ]
        }
      },
      "normalizer":{
        "sortnormalizer":{
          "type":"custom",
          "char_filter":["punc_filter"],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings":{
        "my_document":{
            "_all":{"enabled":false},
            "properties": {
                "collection_identifier":
                {
                   "type":"keyword"
                },
                "document_file_name":
                {
                   "type":"text",
                   "analyzer":"lower_keyword_analyzer"
                },
                "document_type":
                {
                   "type":"keyword"
                },
                "document_title":
                {
                   "type":"text",
                   "analyzer":"lower_keyword_analyzer",
                   "fields":{
                     "sort":{
                       "type":"keyword",
                       "normalizer": "sortnormalizer"
                     },
                     "case_sensitive":{
                       "type":"keyword"
                     }
                   }
                },
                "attachment":{
                  "properties":{
                    "content": {"type":"text","store": true}
                  }
                },
                "document_categories":
                {
                   "type":"text",
                   "analyzer":"lower_keyword_analyzer"
                 },
                 "document_tags":
                 {
                    "type":"text",
                    "analyzer":"standard"
                  },
                "document_created_at":
                {
                   "type":"date",
                   "format":"epoch_millis"
                },
                "document_update_at":
                {
                   "type":"date",
                   "format":"epoch_millis"
                },
                "data":{
                  "type":"text"
                }
            }
        }
  }
}

I have configured the pipeline using:

{
              "description" : "Extract attachment information",
              "processors" : [
                {
                  "attachment" : {
                    "field" : "data",
                    "indexed_chars": -1,
                    "properties":["content"]
                  }
                }
              ]
            }

I am using the bulk api. I am using NodeJS. After indexing (no errors returned) and I do a search (GET - /_search), the results include the base64 encoded data but there is no attachment field in the results. I must be missing something because no one else seems to be having this issue. Any ideas?


(David Pilato) #2

Could you share a typical document you are sending?


(Shawn Mullen-2) #3
{"document_identifiers":["1582-09-8"],"collection_identifier":"1","document_file_name":"actadmission.pdf","document_type":"pdf","document_title":"TRIFLURALIN","document_categories":["Hazard Assessment/Summary Document(s)"],"document_tags":["1582-09-8"],"document_created_at":1223683200000,"document_updated_at":1467465970727,"data":"[BASE64 ENCODED PDF DATA]"}

(Shawn Mullen-2) #4

ok, well, i feel a bit stupid but i figured out the issue. Like I said I am using the bulk api. However, I was submitting the documents as "update" instead instead of "index".


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.