A couple of us have been comparing notes on our Logstash installations at
the larger end of the scale, and something about ElasticSearch has us
baffled.
We're hoping someone here can shed some light on this.
Currently I have 228,262,883 documents in an index.
It's taking up 243.1Gb of space on disk.
The average size of the messages going into Logstash (which then converted
to json and put in ES) was only ~500 bytes each.
At 500 bytes, that's about 106Gb of raw logs.
I'm adding many fields to the json which gets dropped into ES, but still..
I would expect that with compression that space used would go down, not
up.
This is my mapping: https://gist.github.com/avleen/7440270
The only field being analyzed, is "message".
And.. we just removed the "message" field from being sent to elasticsearch.
The docs/index size ratio did not change much at all (if any).
Still getting ~1k - 1.5k disk space used, per document in elasticsearch.
It seems odd that for such a small source, that the stored space should be
so much larger even with LZ4 compression?
We noticed that while store-level compression might be helping some, it
doesn't seem to be helping as much as it could. Running gzip on the data
(raw and the index files) seems to provide quite a bit more compression
that we're getting right now.
Likewise, enabling compression on ZFS reduced that space taken by almost
half.
Overall, I'm trying to index several billion log lines per day, and
multi-Tb indexes add up in cost.
Does anyone have any suggestions on what we could do?
(big thanks to Jordan who has already gone way out of way to help with
this!)
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.