Optimize for minimal storage space with many tiny documents

I want to store a lot of data points to run aggregations on them. Here's my index definition:

mappings: {
  datapoint: {
    properties: {
      timestamp: { type: 'date', format: 'date_time'},
      value: { type: 'float', index: 'no', store: false, doc_values: true },
      metric_name: { type: 'string', index: 'not_analyzed', store: false, doc_values: true }
    },
    _all: { enabled: false},
    _source: { enabled: false}
  }
}

The two queries that I'd like to run on this data are: "For timestamp in range a-b and metric_name c, what is the sum of value?" and "Which metric_names do we have?"

There are 800 different metric names, each one about 80 bytes long. Since the cardinality is so low, I was hoping for a good compression ratio. There are 8 byte per document of real (incompressible) data, plus the ID which is auto-generated by Elasticsearch.

Right now I am looking at about 3,5 GB for 100M documents of this type, i.e. about 37 bytes per document. Since I am creating a few million new datapoints per day, I'd like to know if I can get the storage requirements further down. Is there anything I can optimize further?

What version are you on?

The machine I was testing with has 1.7.3, because I wanted the Sense plugin for easier testing. I could upgrade to 2.0.

I upgraded to 2.1 and set codec: best_compression and now I am at 25 bytes per document. That's already better but of course I'd still take suggestions to go down even further.

Can you store the metric-name in a database and then reference metric.id which will be a 8byte integer ?

I have only about 1000 unique metric names. If replacing them with an integer values helps, there must be something seriously wrong with Elasticsearch's index format.

Are you letting Elasticsearch assign the IDs of the documents? If you know your data very well, you might be able to generate a unique key at the application level that is more compact and save space that way. When designing a key, it is worth reading the advice in this blog post.

Make sure it's a short integer then. Also shorten field names to 1 character and report back if the size shrinked.