Write amplification and SSD

At the core training I attended last year there was a side note on SSD and
write amplification: roughly along the lines of: write amplification can be
a big problem with SSD (as writes can be around 4KB but deletes are often
in blocks of around 512KB, and that the problem gets worse the smaller and
the more random the writes are), but that write amplification is never an
issue in ES as all writes are sequential anyway (reading from my notes
here).

What does that mean exactly? That write amplification can be a big
problem with SSD, but not with ES on SSD, or that the problem is relevant
with lots of random writes? (I suspect the former, but am not quite sure).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67ed8b75-fe8b-4f40-b41a-b66cf6eb82bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It means that ES works well with SSDs since Lucene is write-once under the
hood, so it is "easy" on the SSDs, vs other approaches which do random
writes to different places causing the higher write amplification.

But, this is balanced with the fact that Lucene must also periodically
merge the segments, which is in fact its own higher level form of write
amplification: when you first index a doc, it's written into a new segment,
but over that doc's lifetime in the index it may be copied another 4-5
times or something before it lives in a "max sized" segment. Still, that
higher write amplification likely works out to much less stress on the SSD
than databases that do random writes to their stores.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Dec 16, 2014 at 4:11 AM, AndrewK kenworthyas@gmail.com wrote:

At the core training I attended last year there was a side note on SSD and
write amplification: roughly along the lines of: write amplification can be
a big problem with SSD (as writes can be around 4KB but deletes are often
in blocks of around 512KB, and that the problem gets worse the smaller and
the more random the writes are), but that write amplification is never an
issue in ES as all writes are sequential anyway (reading from my notes
here).

What does that mean exactly? That write amplification can be a big
problem with SSD, but not with ES on SSD, or that the problem is relevant
with lots of random writes? (I suspect the former, but am not quite sure).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/67ed8b75-fe8b-4f40-b41a-b66cf6eb82bc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/67ed8b75-fe8b-4f40-b41a-b66cf6eb82bc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRc9im3H%3DfKx5hauVVZL%3DE00aUnWf-DrZtO7etPC-VzWEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

All SSDs have internal rewrites due to wear leveling and garbage
collection, and the issue is not only caused by random writes from the
application, but that too many internal rewrites reduce SSD performance and
lifetime.

I think the contribution of reducing write amplification from application
layer is rather small, because the main focus of SSD performance in that
area is depending on the controller algorithms. E.g. Sandforce controllers
uses compression and can achieve rates of 0.14, much less than other
controllers:

Jörg

On Tue, Dec 16, 2014 at 10:29 AM, Michael McCandless <mike@elasticsearch.com

wrote:

It means that ES works well with SSDs since Lucene is write-once under the
hood, so it is "easy" on the SSDs, vs other approaches which do random
writes to different places causing the higher write amplification.

But, this is balanced with the fact that Lucene must also periodically
merge the segments, which is in fact its own higher level form of write
amplification: when you first index a doc, it's written into a new segment,
but over that doc's lifetime in the index it may be copied another 4-5
times or something before it lives in a "max sized" segment. Still, that
higher write amplification likely works out to much less stress on the SSD
than databases that do random writes to their stores.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Dec 16, 2014 at 4:11 AM, AndrewK kenworthyas@gmail.com wrote:

At the core training I attended last year there was a side note on SSD
and write amplification: roughly along the lines of: write amplification
can be a big problem with SSD (as writes can be around 4KB but deletes are
often in blocks of around 512KB, and that the problem gets worse the
smaller and the more random the writes are), but that write amplification
is never an issue in ES as all writes are sequential anyway (reading from
my notes here).

What does that mean exactly? That write amplification can be a big
problem with SSD, but not with ES on SSD, or that the problem is relevant
with lots of random writes? (I suspect the former, but am not quite sure).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/67ed8b75-fe8b-4f40-b41a-b66cf6eb82bc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/67ed8b75-fe8b-4f40-b41a-b66cf6eb82bc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc9im3H%3DfKx5hauVVZL%3DE00aUnWf-DrZtO7etPC-VzWEQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc9im3H%3DfKx5hauVVZL%3DE00aUnWf-DrZtO7etPC-VzWEQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFSWo-n4a%3Dox5D7xMBWoF5Yd3E%2Beb4Vuzae4eQfmgvJdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you for the feedback!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6dc59c9e-6504-4786-95e9-7a951b2694d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.