Our index size seems to grow and grow over time. I assume this is due to
versioning. I don't care about having multiple versions of a document.
How can I just turn it off?
--
Our index size seems to grow and grow over time. I assume this is due to
versioning. I don't care about having multiple versions of a document.
How can I just turn it off?
--
In fact, ES doesn't store multiple version of docs but use a version number for
optional concurrency control. See
Elasticsearch Platform — Find real-time answers at scale | Elastic
http://www.elasticsearch.org/guide/reference/api/index_.html
That said, you can call Optimize API to optimize your index. See :
Elasticsearch Platform — Find real-time answers at scale | Elastic
http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html
Are you creating and deleting many documents?
HTH
David
Le 19 octobre 2012 à 15:59, Cailin Nelson cailin@turntable.fm a écrit :
Our index size seems to grow and grow over time. I assume this is due to
versioning. I don't care about having multiple versions of a document. How
can I just turn it off?--
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--
Ah - thanks for the clarification.
Our rough numbers are something like the following:
Is the optimize API something I should be working in to our regular process?
--
I understand the "modern way" to do that is not to optimize, but to
modify the merge policy which is used to automatically combine the
low-level Lucene segments as needed. The values that trigger a merge
and how to re-merge are all defined by the merge policy.
I too was all worried about optimize until I saw the merge policy kick
in and clean things up. What you want is for deleted documents to
end up in segments that are small enough to likely eventually either
completely disappear or get merged together with others thus dropping
the deletes and even elimating whole segment files, but with some
balance between too many segments, and no segments that ever get merged
together.
-Paul
On 10/19/2012 7:31 AM, Cailin Nelson wrote:
Ah - thanks for the clarification.
Our rough numbers are something like the following:
- 20 million docs
- 500,000 re-indexes per day
- 10,000 deletes per day (but occasionally up to 1 million)
- 10,000 new docs per day
Is the optimize API something I should be working in to our regular
process?
--
As Paul said, there is a background merge process running all the time, with reasonable defaults, so you don't need to run the optimize API in a scheduled manner at all.
On Oct 19, 2012, at 9:56 PM, P. Hill parehill1@gmail.com wrote:
I understand the "modern way" to do that is not to optimize, but to modify the merge policy which is used to automatically combine the low-level Lucene segments as needed. The values that trigger a merge and how to re-merge are all defined by the merge policy.
I too was all worried about optimize until I saw the merge policy kick in and clean things up. What you want is for deleted documents to end up in segments that are small enough to likely eventually either completely disappear or get merged together with others thus dropping the deletes and even elimating whole segment files, but with some balance between too many segments, and no segments that ever get merged together.
-Paul
On 10/19/2012 7:31 AM, Cailin Nelson wrote:
Ah - thanks for the clarification.
Our rough numbers are something like the following:
- 20 million docs
- 500,000 re-indexes per day
- 10,000 deletes per day (but occasionally up to 1 million)
- 10,000 new docs per day
Is the optimize API something I should be working in to our regular process?
--
--
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.