Hi,
I am writing to Elasticsearch network devices performance data. Index size per 24 hours is 250227929 documents which sums up to 32 Gb of disk space.
I am looking into possibilities to reduce disco usage.
Documents looks like this:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 714,
"max_score": 16.683826,
"hits": [
{
"_index": "telegraf-interface-2018.06.13",
"_type": "metrics",
"_id": "b0gK-WMBvxGYDnrPBB7n",
"_score": 16.683826,
"_source": {
"@timestamp": "2018-06-13T14:06:12+02:00",
"interface": {
"ifHCInOctets": 166294586509,
"ifHCInUcastPkts": 108997302,
"ifHCOutOctets": 77963521118,
"ifHCOutUcastPkts": 208360884,
"ifHighSpeed": 10000
},
"measurement_name": "interface",
"tag": {
"agent_host": "labrouter.local.net",
"ifAlias": "Uplink_To_switch",
"ifDescr": "TenGigabitEthernet1/50",
"platform_tag": "IOS"
}
}
},
{
"_index": "telegraf-interface-2018.06.13",
"_type": "metrics",
"_id": "RUoK-WMBvxGYDnrPRlTQ",
"_score": 16.683826,
"_source": {
"@timestamp": "2018-06-13T14:00:12+02:00",
"interface": {
"ifHCInOctets": 166294255372,
"ifHCInUcastPkts": 108996735,
"ifHCOutOctets": 77962950948,
"ifHCOutUcastPkts": 208360297,
"ifHighSpeed": 10000
},
"measurement_name": "interface",
"tag": {
"agent_host": "labrouter.local.net",
"ifAlias": "Uplink_To_switch",
"ifDescr": "TenGigabitEthernet1/50",
"platform_tag": "IOS"
}
}
},
}
]
}
}
Interface part is not indexed, tag part is indexed. Same pastern is repeated thousands of times.
So I have few questions.
Are key names actually repeated in all documents and so is consuming disc space? Or Are repeated key names somehow compressed within database?
Does key name length has impact on disk usage. For example if I would shorten key names could I expect documents consume less disk space?