How can I disable versioning?


(Cailin Nelson) #1

Our index size seems to grow and grow over time. I assume this is due to
versioning. I don't care about having multiple versions of a document.
How can I just turn it off?

--


(David Pilato) #2

In fact, ES doesn't store multiple version of docs but use a version number for
optional concurrency control. See
http://www.elasticsearch.org/guide/reference/api/index_.html
http://www.elasticsearch.org/guide/reference/api/index_.html

That said, you can call Optimize API to optimize your index. See :
http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html
http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html

Are you creating and deleting many documents?

HTH
David

Le 19 octobre 2012 à 15:59, Cailin Nelson cailin@turntable.fm a écrit :

Our index size seems to grow and grow over time. I assume this is due to
versioning. I don't care about having multiple versions of a document. How
can I just turn it off?

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--


(Cailin Nelson) #3

Ah - thanks for the clarification.

Our rough numbers are something like the following:

  • 20 million docs
  • 500,000 re-indexes per day
  • 10,000 deletes per day (but occasionally up to 1 million)
  • 10,000 new docs per day

Is the optimize API something I should be working in to our regular process?

--


(phill) #4

I understand the "modern way" to do that is not to optimize, but to
modify the merge policy which is used to automatically combine the
low-level Lucene segments as needed. The values that trigger a merge
and how to re-merge are all defined by the merge policy.

I too was all worried about optimize until I saw the merge policy kick
in and clean things up. What you want is for deleted documents to
end up in segments that are small enough to likely eventually either
completely disappear or get merged together with others thus dropping
the deletes and even elimating whole segment files, but with some
balance between too many segments, and no segments that ever get merged
together.

-Paul

On 10/19/2012 7:31 AM, Cailin Nelson wrote:

Ah - thanks for the clarification.

Our rough numbers are something like the following:

  • 20 million docs
  • 500,000 re-indexes per day
  • 10,000 deletes per day (but occasionally up to 1 million)
  • 10,000 new docs per day

Is the optimize API something I should be working in to our regular
process?

--


(Shay Banon) #5

As Paul said, there is a background merge process running all the time, with reasonable defaults, so you don't need to run the optimize API in a scheduled manner at all.

On Oct 19, 2012, at 9:56 PM, P. Hill parehill1@gmail.com wrote:

I understand the "modern way" to do that is not to optimize, but to modify the merge policy which is used to automatically combine the low-level Lucene segments as needed. The values that trigger a merge and how to re-merge are all defined by the merge policy.

I too was all worried about optimize until I saw the merge policy kick in and clean things up. What you want is for deleted documents to end up in segments that are small enough to likely eventually either completely disappear or get merged together with others thus dropping the deletes and even elimating whole segment files, but with some balance between too many segments, and no segments that ever get merged together.

-Paul

On 10/19/2012 7:31 AM, Cailin Nelson wrote:

Ah - thanks for the clarification.

Our rough numbers are something like the following:

  • 20 million docs
  • 500,000 re-indexes per day
  • 10,000 deletes per day (but occasionally up to 1 million)
  • 10,000 new docs per day

Is the optimize API something I should be working in to our regular process?

--

--


(system) #6