Can Attachment type do not store base64 source?


(昕玫) #1

Hi,I'm a new to ES, our project is about email.

Recently, we try to index attachments with Attachment Type.What confused us is that Although we only need the content index, ES still store file's base64 string in _source. I am afraid that this may cost a lot of hard disk space.

Is there some way to solve the problem?Is it possible to index content and discard base64 string?

Thanks,
Shiny ke


(Jun Ohtani) #2

Hi shinyke,

You can control to include/exclude _source in your mappings.
See : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

Example :

PUT /attachment_sample
{
  "mappings": {
    "files": {
      "_source": {
        "excludes" : ["my_attachments"]
      }, 
      "properties": {
        "my_attachments" : { "type": "attachment" },
        "title": { "type": "string" }
      }
    }
  }
}

PUT /attachment_sample/files/1
{
  "title": "fuga",
  "my_attachments" : "...base64 string..."
}

GET /attachment_sample/files/_search
#then, _source include only title
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "attachment_sample",
            "_type": "files",
            "_id": "1",
            "_score": 1,
            "_source": {
               "title": "fuga"
            }
         }
      ]
   }
}

Does it make sense?


(昕玫) #3

Yes, I try exclude _source sentence and it work fine!

Many Thx~! johtani


(system) #4