Hi everyone!
I'm experiencing weird behavior when measuring the disk usage by Elasticsearch 6.
I test my machine like so:
This is the mapping:
"transaction": {
"properties": {
"changedFieldsList": {
"type": "nested",
"properties": {
"fieldChanged": {
"type": "text"
},
"fieldId": {
"type": "text"
},
"fieldType": {
"type": "text"
},
"fieldValue": {
"type": "text"
}
}
},
"commandCommitScn": {
"type": "text"
},
"commandId": {
"type": "text"
},
"commandScn": {
"type": "text"
},
"commandSequence": {
"type": "text"
},
"commandTimestamp": {
"type": "text"
},
"commandType": {
"type": "text"
},
"conditionFieldsList": {
"type": "nested",
"properties": {
"fieldId": {
"type": "text"
},
"fieldType": {
"type": "text"
},
"fieldValue": {
"type": "text"
}
}
},
"objectDBName": {
"type": "text"
},
"objectId": {
"type": "text"
},
"objectSchemaName": {
"type": "text"
}
}
}
I didn't change anything from the default configuration.
Then, I store the data sent to elastic in a seperate folder (lets call it sep_data, for short), and I checked the size of sep_data folder and the size of the elasticsearch data folder, using du -shk on my Linux Centos 7.
While the sep_data file added the same amount of size whenever I added another 100K, the elasticsearch data folder varied greatly.
Here are the test results:
test number | Parsed transactions | sep_data dir size Start [MB] | sep_data dir size End [MB] | sep_data dir size growth [MB] | Elastic data folder start size [MB] | Elastic data folder end size [MB] | Elastic data folder folder growth [MB] | Elastic data / sep_data ratio |
---|---|---|---|---|---|---|---|---|
1 | 100,000 | 0 | 389 | 389 | 0 | 718 | 718 | 185% |
2 | 100,000 | 389 | 778 | 389 | 718 | 936 | 218 | 56% |
3 | 100,000 | 778 | 1167 | 389 | 936 | 1719 | 783 | 201% |
4 | 100,000 | 1167 | 1556 | 389 | 1719 | 2020 | 301 | 77% |
5 | 100,000 | 1556 | 1945 | 389 | 2020 | 2840 | 820 | 211% |
6 | 100,000 | 1945 | 2334 | 389 | 2840 | 3333 | 493 | 127% |
7 | 100,000 | 2334 | 2723 | 389 | 3333 | 3642 | 309 | 79% |
8 | 100,000 | 2723 | 3112 | 389 | 3642 | 3747 | 105 | 27% |
I found this data to be very wierd, and I'm having trouble explaining why Elasticsearch behave this way.
I need to have a relaible way to predict how much disk storage will be used when I insert X tranactions, and I need to be able to explain why elastic does this.
So, can anyone help me?