Does bulk indexing needs an optimize and the end?


(Onilton Maciel) #1

I am bulking indexing something like 50-100 documents every 30s / 1 minute
(the indexing keeps runnning all the time, because there are always
documents coming)

Do I need to run optimize after that bulk? Or can I rely on lucene merge
process?

I know if I am bulk indexing all a whole set of documents like (25
million), 300 documents for each request, it's good to run optimize at the
end of all the indexing have finished

But in that situation a mentioned early, do I need to run optimize?

Does bulk indexing segments merge is the same as one document indexing?


(Shay Banon) #2

You don't need to run optimize, merge process will happen continuously based on the relevant parameters set.

On Saturday, February 25, 2012 at 4:56 PM, Onilton Maciel wrote:

I am bulking indexing something like 50-100 documents every 30s / 1 minute (the indexing keeps runnning all the time, because there are always documents coming)

Do I need to run optimize after that bulk? Or can I rely on lucene merge process?

I know if I am bulk indexing all a whole set of documents like (25 million), 300 documents for each request, it's good to run optimize at the end of all the indexing have finished

But in that situation a mentioned early, do I need to run optimize?

Does bulk indexing segments merge is the same as one document indexing?


(Onilton Maciel) #3

Ok, my doubt was because of this text found on at elasticsearch.org

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

"Bulk Indexing Usage"

...

Then, once bulk indexing is done, the settings can be updated (back to the

defaults for example):
curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'
And, an optimize should be called:
curl -XPOST 'http://localhost:9200/test/_optimize?max_num_segments=5'

On Sun, Feb 26, 2012 at 3:43 PM, Shay Banon kimchy@gmail.com wrote:

You don't need to run optimize, merge process will happen continuously
based on the relevant parameters set.

On Saturday, February 25, 2012 at 4:56 PM, Onilton Maciel wrote:

I am bulking indexing something like 50-100 documents every 30s / 1 minute
(the indexing keeps runnning all the time, because there are always
documents coming)

Do I need to run optimize after that bulk? Or can I rely on lucene merge
process?

I know if I am bulk indexing all a whole set of documents like (25
million), 300 documents for each request, it's good to run optimize at the
end of all the indexing have finished

But in that situation a mentioned early, do I need to run optimize?

Does bulk indexing segments merge is the same as one document indexing?


(Shay Banon) #4

Yea, I meant as in bulk indexing a large amount of data, so while you are at it, might make sense to optimize at the end. Not bulk index API :slight_smile:

On Sunday, February 26, 2012 at 10:45 PM, Onilton Maciel wrote:

Ok, my doubt was because of this text found on at elasticsearch.org (http://elasticsearch.org)

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

"Bulk Indexing Usage"

...
Then, once bulk indexing is done, the settings can be updated (back to the defaults for example):
curl -XPUT localhost:9200/test/_settings -d '{
"index" : {
"refresh_interval" : "1s"
}
}'
And, an optimize should be called:
curl -XPOST 'http://localhost:9200/test/_optimize?max_num_segments=5'

On Sun, Feb 26, 2012 at 3:43 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

You don't need to run optimize, merge process will happen continuously based on the relevant parameters set.

On Saturday, February 25, 2012 at 4:56 PM, Onilton Maciel wrote:

I am bulking indexing something like 50-100 documents every 30s / 1 minute (the indexing keeps runnning all the time, because there are always documents coming)

Do I need to run optimize after that bulk? Or can I rely on lucene merge process?

I know if I am bulk indexing all a whole set of documents like (25 million), 300 documents for each request, it's good to run optimize at the end of all the indexing have finished

But in that situation a mentioned early, do I need to run optimize?

Does bulk indexing segments merge is the same as one document indexing?


(system) #5