Reduce size of index which contains integer fields only

Hi,
I am trying to create an index from integer fields only, like that
{
"type" : 2,
"cat_id": 1,
"value": 12.22,
"date" : "2015-01-01T00:00:00"
}
Does anyone experience the similar circumstance, please give give me advice on choosing appropriate mapping, to reduce index size as much as possible.
Here is mine, plz review
PUT /data
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 0,
"store.compress.stored": true
}
},
"mappings": {
"ds":{
"_source" : { "enabled" : false },
"_all" : { "enabled" : false },
"properties" : {
"cat_id" : { "type" : "integer", "index" : "not_analyzed", "precision_step": 4096, "doc_values": true},
"type" : { "type" : "integer", "index" : "not_analyzed", "precision_step": 4096, "doc_values": true, "store": "yes" },
"value" : { "type" : "float", "index" : "not_analyzed", "precision_step": 4096, "store": "yes", "fielddata": {"loading": "eager"}},
"date" : { "type": "date", "index": "not_analyzed", "precision_step": 4096, "format": "YYYY-MM-dd HH:mm:ss", "doc_values": true}
}
}
}
}

That looks pretty good to me, what problems do you have with it?

Also you should really store that date in UTC :slight_smile:

I just want to reduce the index size as much as possible. Maybe there are some tricks out there that I didn't know. Hence this topic :smile:

There are trade offs here though. For example, setting the precision step to 4096 will mean that range searches will not be as efficient in terms of response time. Also I don't think you will save much space by disabling _source and setting store: yes on most of the fields. It may well be better to use the _source field.

Also, you should test each of these settings on your use case. You may fine some of them are not decreasing disk usage as much as you think or may in fact be increasing the disk usage (e.g. because the compression is not as efficient on separate stored fields compared with the _source field.

1 Like