Knapsack export limit

Anybody played around with the knapsack export plugin? What I'm
encountering is some indices will export fully ok (100k+ docs), but for one
that was over 31.5gb with 600k+ docs, it seems to have capped my export at
20k. Could it be a timeout thing? Or are there some other limitations I'm
not aware of?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Awwww... killer. Could be the 11 digit octal = 8GB limit in the TAR
format I use, I have to recheck to be sure. I have to add alternatives
of reliable package formats for large sequential streams to the plugin.
See also discussion at
https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

Thanks for the pointer!

Jörg

Am 19.02.13 22:16, schrieb krispyjala:

Anybody played around with the knapsack export plugin? What I'm
encountering is some indices will export fully ok (100k+ docs), but
for one that was over 31.5gb with 600k+ docs, it seems to have capped
my export at 20k. Could it be a timeout thing? Or are there some
other limitations I'm not aware of? --

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

np thanks for writing it! I know the elasticsearch guys are working on a
truly robust import/export feature, but for now this is truly a lifesaver!

--KJ

On Tuesday, February 19, 2013 2:23:03 PM UTC-8, Jörg Prante wrote:

Awwww... killer. Could be the 11 digit octal = 8GB limit in the TAR
format I use, I have to recheck to be sure. I have to add alternatives
of reliable package formats for large sequential streams to the plugin.
See also discussion at
https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

Thanks for the pointer!

Jörg

Am 19.02.13 22:16, schrieb krispyjala:

Anybody played around with the knapsack export plugin? What I'm
encountering is some indices will export fully ok (100k+ docs), but
for one that was over 31.5gb with 600k+ docs, it seems to have capped
my export at 20k. Could it be a timeout thing? Or are there some
other limitations I'm not aware of? --

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Any update on this, Jörg? I'm trying to export an index with 11 million
docs and size of about 150GB.

Thanks,
Kris

On Tuesday, February 19, 2013 5:23:03 PM UTC-5, Jörg Prante wrote:

Awwww... killer. Could be the 11 digit octal = 8GB limit in the TAR
format I use, I have to recheck to be sure. I have to add alternatives
of reliable package formats for large sequential streams to the plugin.
See also discussion at
https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

Thanks for the pointer!

Jörg

Am 19.02.13 22:16, schrieb krispyjala:

Anybody played around with the knapsack export plugin? What I'm
encountering is some indices will export fully ok (100k+ docs), but
for one that was over 31.5gb with 600k+ docs, it seems to have capped
my export at 20k. Could it be a timeout thing? Or are there some
other limitations I'm not aware of? --

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just hacked a snapshot version with only a quick test, but can you try
if this works better? 1.1.0-SNAPSHOT uses Apache common-compress now for
writing the tar archive.

https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true

Best,

Jörg

Am 13.03.13 16:46, schrieb krispyjala:

Any update on this, Jörg? I'm trying to export an index with 11
million docs and size of about 150GB.

Thanks,
Kris

On Tuesday, February 19, 2013 5:23:03 PM UTC-5, Jörg Prante wrote:

Awwww... killer. Could be the 11 digit octal = 8GB limit in the TAR
format I use, I have to recheck to be sure. I have to add
alternatives
of reliable package formats for large sequential streams to the
plugin.
See also discussion at
https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion
<https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>


Thanks for the pointer!

Jörg

Am 19.02.13 22:16, schrieb krispyjala:
> Anybody played around with the knapsack export plugin? What I'm
> encountering is some indices will export fully ok (100k+ docs), but
> for one that was over 31.5gb with 600k+ docs, it seems to have
capped
> my export at 20k.  Could it be a timeout thing? Or are there some
> other limitations I'm not aware of? --

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the snapshot, Jörg. However, I am running into this error and
so the export stops:

[2013-03-19 14:32:49,306][DEBUG][action.search.type ] [Ron Weasley]
[868242] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [Hermione
Granger][inet[/192.168.1.131:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [868242]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:459)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:208)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:697)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:686)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

could it be some timeouts I have to set longer?

Thanks,
Kris.

On Wednesday, March 13, 2013 6:52:35 PM UTC-4, Jörg Prante wrote:

Just hacked a snapshot version with only a quick test, but can you try
if this works better? 1.1.0-SNAPSHOT uses Apache common-compress now for
writing the tar archive.

https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true

Best,

Jörg

Am 13.03.13 16:46, schrieb krispyjala:

Any update on this, Jörg? I'm trying to export an index with 11
million docs and size of about 150GB.

Thanks,
Kris

On Tuesday, February 19, 2013 5:23:03 PM UTC-5, Jörg Prante wrote:

Awwww... killer. Could be the 11 digit octal = 8GB limit in the TAR 
format I use, I have to recheck to be sure. I have to add 
alternatives 
of reliable package formats for large sequential streams to the 
plugin. 
See also discussion at 

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

<

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>

Thanks for the pointer! 

Jörg 

Am 19.02.13 22:16, schrieb krispyjala: 
> Anybody played around with the knapsack export plugin? What I'm 
> encountering is some indices will export fully ok (100k+ docs), 

but

> for one that was over 31.5gb with 600k+ docs, it seems to have 
capped 
> my export at 20k.  Could it be a timeout thing? Or are there some 
> other limitations I'm not aware of? -- 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, you have some load on the system? Or very large documents? There
is a REST parameter "millis" which is by default set to 30000 (30
seconds). You should increase it.
Another option is to reduce the result size in the REST parameter "size"
from 1000 (default) to a lower value.

Best,

Jörg

Am 19.03.13 22:37, schrieb krispyjala:

Thanks for the snapshot, Jörg. However, I am running into this error
and so the export stops:

[2013-03-19 14:32:49,306][DEBUG][action.search.type ] [Ron
Weasley] [868242] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [Hermione
Granger][inet[/192.168.1.131:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [868242]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:459)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:208)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:697)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:686)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

could it be some timeouts I have to set longer?

Thanks,
Kris.

On Wednesday, March 13, 2013 6:52:35 PM UTC-4, Jörg Prante wrote:

Just hacked a snapshot version with only a quick test, but can you
try
if this works better? 1.1.0-SNAPSHOT uses Apache common-compress
now for
writing the tar archive.

https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true
<https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true>


Best,

Jörg

Am 13.03.13 16:46, schrieb krispyjala:
> Any update on this, Jörg? I'm trying to export an index with 11
> million docs and size of about 150GB.
>
> Thanks,
> Kris
>
> On Tuesday, February 19, 2013 5:23:03 PM UTC-5, Jörg Prante wrote:
>
>     Awwww... killer. Could be the 11 digit octal = 8GB limit in
the TAR
>     format I use, I have to recheck to be sure. I have to add
>     alternatives
>     of reliable package formats for large sequential streams to the
>     plugin.
>     See also discussion at
>
https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion
<https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>

>    
<https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion
<https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>>

>
>
>     Thanks for the pointer!
>
>     Jörg
>
>     Am 19.02.13 22:16, schrieb krispyjala:
>     > Anybody played around with the knapsack export plugin?
What I'm
>     > encountering is some indices will export fully ok (100k+
docs), but
>     > for one that was over 31.5gb with 600k+ docs, it seems to
have
>     capped
>     > my export at 20k.  Could it be a timeout thing? Or are
there some
>     > other limitations I'm not aware of? --
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from
it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out
<https://groups.google.com/groups/opt_out>.
>
>

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks again, Jörg! After I set the millis to 600000, the export completed
without any errors.

Now, as I was testing the import, I noticed that the bulk request was doing
it in batches of 100. Is that customizable also?

Thanks,
Kris.

On Tuesday, March 19, 2013 7:56:03 PM UTC-4, Jörg Prante wrote:

Yes, you have some load on the system? Or very large documents? There
is a REST parameter "millis" which is by default set to 30000 (30
seconds). You should increase it.
Another option is to reduce the result size in the REST parameter "size"
from 1000 (default) to a lower value.

Best,

J?rg

Am 19.03.13 22:37, schrieb krispyjala:

Thanks for the snapshot, J?rg. However, I am running into this error
and so the export stops:

[2013-03-19 14:32:49,306][DEBUG][action.search.type ] [Ron
Weasley] [868242] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [Hermione
Granger][inet[/192.168.1.131:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [868242]
at

org.elasticsearch.search.SearchService.findContext(SearchService.java:459)

at

org.elasticsearch.search.SearchService.executeScan(SearchService.java:208)

at

org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:697)

at

org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:686)

at

org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)

at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

could it be some timeouts I have to set longer?

Thanks,
Kris.

On Wednesday, March 13, 2013 6:52:35 PM UTC-4, J?rg Prante wrote:

Just hacked a snapshot version with only a quick test, but can you 
try 
if this works better? 1.1.0-SNAPSHOT uses Apache common-compress 
now for 
writing the tar archive. 

https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true

<

https://github.com/jprante/elasticsearch-knapsack/blob/master/downloads/elasticsearch-knapsack-1.1.0-SNAPSHOT.zip?raw=true>

Best, 

J?rg 

Am 13.03.13 16:46, schrieb krispyjala: 
> Any update on this, J?rg? I'm trying to export an index with 11 
> million docs and size of about 150GB. 
> 
> Thanks, 
> Kris 
> 
> On Tuesday, February 19, 2013 5:23:03 PM UTC-5, J?rg Prante wrote: 
> 
>     Awwww... killer. Could be the 11 digit octal = 8GB limit in 
the TAR 
>     format I use, I have to recheck to be sure. I have to add 
>     alternatives 
>     of reliable package formats for large sequential streams to 

the

>     plugin. 
>     See also discussion at 
> 

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

<

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>

>     
<

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion

<

https://groups.google.com/d/topic/digital-curation/qOMmsakk07w/discussion>>

> 
> 
>     Thanks for the pointer! 
> 
>     J?rg 
> 
>     Am 19.02.13 22:16, schrieb krispyjala: 
>     > Anybody played around with the knapsack export plugin? 
What I'm 
>     > encountering is some indices will export fully ok (100k+ 
docs), but 
>     > for one that was over 31.5gb with 600k+ docs, it seems to 
have 
>     capped 
>     > my export at 20k.  Could it be a timeout thing? Or are 
there some 
>     > other limitations I'm not aware of? -- 
> 
> -- 
> You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> To unsubscribe from this group and stop receiving emails from 
it, send 
> an email to elasticsearc...@googlegroups.com. 
> For more options, visit https://groups.google.com/groups/opt_out 
<https://groups.google.com/groups/opt_out>. 
> 
> 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, the REST parameter is "bulk_size" - sorry for the confusion of the
different names, will be cleaned up.

Jörg

Am 20.03.13 19:50, schrieb krispyjala:

Thanks again, Jörg! After I set the millis to 600000, the export
completed without any errors.

Now, as I was testing the import, I noticed that the bulk request was
doing it in batches of 100. Is that customizable also?

Thanks,
Kris.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Again, works like a charm!

I was wondering though if there's a way to cancel the import mid-way?

Thanks for answering my questions!

-- Kris.

On Wednesday, March 20, 2013 12:01:22 PM UTC-7, Jörg Prante wrote:

Yes, the REST parameter is "bulk_size" - sorry for the confusion of the
different names, will be cleaned up.

Jörg

Am 20.03.13 19:50, schrieb krispyjala:

Thanks again, Jörg! After I set the millis to 600000, the export
completed without any errors.

Now, as I was testing the import, I noticed that the bulk request was
doing it in batches of 100. Is that customizable also?

Thanks,
Kris.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.