I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
On Wednesday, February 9, 2011 at 12:07 AM, Karussell wrote:
So, in your case, create and add it directly to a BulkRequestBuilder.
ok. what's so expensive to create that jsonBuilder?
Its not a question of expensive, but the byte buffer that is behind it is being reused on a thread local to reduce byte copying.
safeJsonBuilder
do you mean?
XContentBuilder docBuilder =
JsonXContent.unCachedContentBuilder().startObject();
Yea, that is there as well. I added safeJsonBuilder that delegates to that one.
Also, scrolling is not the best way to reindex the data
so, how would you do this at the moment. would you use simple search
(e.g. with a date filter) instead scrolling?
You will have the same problem with search and doing pagination using from / size. Thats why the scan type is needed. The overhead does not come from the scrolling implementation, but from how search is executed when it needs to do a sort in lucene, and when its distributed.
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
thanks for your answers! Do you know where the flaw could come from? I
would like to investigate/fix it as I'm unsure if reindexing works
even with this flaw ...
you will have the same problem with search and doing pagination using from / size
Does it mean that if I want docs from pages A..B then ES has to use
memory for 0..B ?
Could I overcome this problem with additional filter queries if I know
how to split up my data - e.g. based on date?
Would the scan type be without scoring (I don't need that)?
On Wednesday, February 9, 2011 at 12:07 AM, Karussell wrote:
So, in your case, create and add it directly to a BulkRequestBuilder.
ok. what's so expensive to create that jsonBuilder?
Its not a question of expensive, but the byte buffer that is behind it is being reused on a thread local to reduce byte copying.
safeJsonBuilder
do you mean?
XContentBuilder docBuilder =
JsonXContent.unCachedContentBuilder().startObject();
Yea, that is there as well. I added safeJsonBuilder that delegates to that one.
Also, scrolling is not the best way to reindex the data
so, how would you do this at the moment. would you use simple search
(e.g. with a date filter) instead scrolling?
You will have the same problem with search and doing pagination using from / size. Thats why the scan type is needed. The overhead does not come from the scrolling implementation, but from how search is executed when it needs to do a sort in lucene, and when its distributed.
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
thanks for your answers! Do you know where the flaw could come from? I
would like to investigate/fix it as I'm unsure if reindexing works
even with this flaw ...
you will have the same problem with search and doing pagination using from / size
Does it mean that if I want docs from pages A..B then ES has to use
memory for 0..B ?
Could I overcome this problem with additional filter queries if I know
how to split up my data - e.g. based on date?
Would the scan type be without scoring (I don't need that)?
On Wednesday, February 9, 2011 at 12:07 AM, Karussell wrote:
So, in your case, create and add it directly to a BulkRequestBuilder.
ok. what's so expensive to create that jsonBuilder?
Its not a question of expensive, but the byte buffer that is behind it is being reused on a thread local to reduce byte copying.
safeJsonBuilder
do you mean?
XContentBuilder docBuilder =
JsonXContent.unCachedContentBuilder().startObject();
Yea, that is there as well. I added safeJsonBuilder that delegates to that one.
Also, scrolling is not the best way to reindex the data
so, how would you do this at the moment. would you use simple search
(e.g. with a date filter) instead scrolling?
You will have the same problem with search and doing pagination using from / size. Thats why the scan type is needed. The overhead does not come from the scrolling implementation, but from how search is executed when it needs to do a sort in lucene, and when its distributed.
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
Yes, memory (at least the priority queue used by lucene) will be done for 0...B. The scan type will be without scoring, yes, and mainly aimed at things like scanning all the hits that match a specific query.
I am not sure about why it works without specifying hte index and when you do. It does not matter when working with a single index. A good place to start in order to solve it is to take your code and make it a test case in ES that recreates it. Check SearchScrollTests in the integration module, you can add a test there.
-shay.banon
On Wednesday, February 9, 2011 at 8:47 PM, Karussell wrote:
hmmh, when I'm doing an 'implicit search' (_all??) it works correctly:
thanks for your answers! Do you know where the flaw could come from? I
would like to investigate/fix it as I'm unsure if reindexing works
even with this flaw ...
you will have the same problem with search and doing pagination using from / size
Does it mean that if I want docs from pages A..B then ES has to use
memory for 0..B ?
Could I overcome this problem with additional filter queries if I know
how to split up my data - e.g. based on date?
Would the scan type be without scoring (I don't need that)?
On Wednesday, February 9, 2011 at 12:07 AM, Karussell wrote:
So, in your case, create and add it directly to a BulkRequestBuilder.
ok. what's so expensive to create that jsonBuilder?
Its not a question of expensive, but the byte buffer that is behind it is being reused on a thread local to reduce byte copying.
safeJsonBuilder
do you mean?
XContentBuilder docBuilder =
JsonXContent.unCachedContentBuilder().startObject();
Yea, that is there as well. I added safeJsonBuilder that delegates to that one.
Also, scrolling is not the best way to reindex the data
so, how would you do this at the moment. would you use simple search
(e.g. with a date filter) instead scrolling?
You will have the same problem with search and doing pagination using from / size. Thats why the scan type is needed. The overhead does not come from the scrolling implementation, but from how search is executed when it needs to do a sort in lucene, and when its distributed.
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
Yes, memory (at least the priority queue used by lucene) will be done for 0...B. The scan type will be without scoring, yes, and mainly aimed at things like scanning all the hits that match a specific query.
I am not sure about why it works without specifying hte index and when you do. It does not matter when working with a single index. A good place to start in order to solve it is to take your code and make it a test case in ES that recreates it. Check SearchScrollTests in the integration module, you can add a test there.
-shay.banon
On Wednesday, February 9, 2011 at 8:47 PM, Karussell wrote:
hmmh, when I'm doing an 'implicit search' (_all??) it works correctly:
thanks for your answers! Do you know where the flaw could come from? I
would like to investigate/fix it as I'm unsure if reindexing works
even with this flaw ...
you will have the same problem with search and doing pagination using from / size
Does it mean that if I want docs from pages A..B then ES has to use
memory for 0..B ?
Could I overcome this problem with additional filter queries if I know
how to split up my data - e.g. based on date?
Would the scan type be without scoring (I don't need that)?
On Wednesday, February 9, 2011 at 12:07 AM, Karussell wrote:
So, in your case, create and add it directly to a BulkRequestBuilder.
ok. what's so expensive to create that jsonBuilder?
Its not a question of expensive, but the byte buffer that is behind it is being reused on a thread local to reduce byte copying.
safeJsonBuilder
do you mean?
XContentBuilder docBuilder =
JsonXContent.unCachedContentBuilder().startObject();
Yea, that is there as well. I added safeJsonBuilder that delegates to that one.
Also, scrolling is not the best way to reindex the data
so, how would you do this at the moment. would you use simple search
(e.g. with a date filter) instead scrolling?
You will have the same problem with search and doing pagination using from / size. Thats why the scan type is needed. The overhead does not come from the scrolling implementation, but from how search is executed when it needs to do a sort in lucene, and when its distributed.
I will have a look. A note on your usage of XContentBuilder, when using XContentFactory#jsonBuilder, a cached version is returned, and it is expected that you will pass hte result to an API before you call it again on the same thread. So, in your case, create and add it directly to a BulkRequestBuilder.
I have just added safeJsonBuilder which will return one that is not cached per thread, but you should use the above.
Also, scrolling is not the best way to reindex the data, as it can get pretty expensive as you page through the hit list. I am planning to add a "scan" type search that will not incur this overhead (but will also do no sorting).
-shay.banon
On Tuesday, February 8, 2011 at 10:29 PM, Karussell wrote:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.