Efficient storage

Hello, am planning on storing 1,5TB of textfiles in Elasticsearch. However I do not have a lot of space to spare above the text file size. Now I have been reading up on elastic search and how to minimize storage and I have already made the following changes

Removed unnecessary fields
Removed _source
Enabled best_compression

This has decreased my the diskspace used by elasticsearch from 7 times the size of the text files down to 3 times the text file. But I was hoping there is a way to compress this even further.

The path is the location where the original textfile is located and is required. And the messageline is currently filled with close to random data very little text that is reused(But must be word searchable). I have thought about splitting the message field but I run into grok running after mutate. Which means if i mutate remove message that grok can't parse the fields. (But that is another problem)

{
  "mapping": {
    "properties": {
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "path": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

Any suggestions are welcome.

Do you need the keyword subfields? If not, I'd remove them.

As i understand it they are required to search the field arent they? if not how would one go about removing them? I think I tried at one point but wasn't allowed to.

You need keyword fields if you need to aggregate or sort.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.