Insert later feature


(vineeth mohan-2) #1

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and are
immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making the
documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there which
can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the two
phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making the
documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there which
can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Michael Sick) #3

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the two
phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts but
do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #4

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am not
sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or reopening
the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #5

Yes, it is possible to disable the translog sync (the component where the
operations are passed from ES to Lucene) with index.gateway.local.flush: -1
and use the flush action for "manual commit" instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES so
the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same time
doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed and
are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsPNodOvvRpW4LNatfF8UqqjVVMBdGZq0%2BjQ-j%2BGM%3DLw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #6

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where the
operations are passed from ES to Lucene) with index.gateway.local.flush: -1
and use the flush action for "manual commit" instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index and
index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES
so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before making
the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #7

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the memory.
This also means that translogs are maintained between restarts and we will
never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where the
operations are passed from ES to Lucene) with index.gateway.local.flush: -1
and use the flush action for "manual commit" instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES
so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate the
two phases of gathering documents and indexing documents. So, by giving a
scheduling option, it will be possible to index (or even reindex) gathered
documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk inserts
but do the actual indexing ( Which should be the CPU consuming part ) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m9-hnRJKMo91tOzQ7g9fGhEOxmBe%2BPyGuxPqWNQX_Hkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #8

Hello Joerg ,

I was still thinking how well will this handle cases where i have like 10
Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the memory.
This also means that translogs are maintained between restarts and we will
never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for "manual commit"
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i am
not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of ES
so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out there
which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mTEsy9vEk8eifZnVxnn%3D%3DCJ%2BZr_RRtR696uBQq_C0_7w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #9

Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i was doing some inserts , there
was no unusual behavior. The head showed the total number of documents i
had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like 10
Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for "manual commit"
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you can
set index.refresh_interval to a higher value than the 1s default either in
general for your index or just during the times when you're doing your bulk
updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets indexed
and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out
there which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DzyOSgyB%3D5-FE1OdeKu1EGX9a4WP3qXnnZn5K_mFiYDg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vineeth mohan-2) #10

Hi ,

I tried the below too without any luck -

curl -XPUT 'localhost:9200/documents/_settings' -d '{
"index" : {
"translog" : {
"disable_flush" : true
}
}
}
'

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i was doing some inserts , there
was no unusual behavior. The head showed the total number of documents i
had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like 10
Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for "manual commit"
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to index
and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you
can set index.refresh_interval to a higher value than the 1s default either
in general for your index or just during the times when you're doing your
bulk updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the same
time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets
indexed and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out
there which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kZi1hPpEoKGif3%3D433Zc0H0EO2Veo5pJqwOqvq7Qrvmw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #11

It's not a dynamic setting, afaik.

Sorry, I don't know for sure how a translog can grow forever.

For my purposes, I decided to handle the challenge in front of ES, with
better timing control, and archive files for replay I can use outside of ES
too.

Jörg

On Sun, Feb 23, 2014 at 9:21 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hi ,

I tried the below too without any luck -

curl -XPUT 'localhost:9200/documents/_settings' -d '{
"index" : {
"translog" : {
"disable_flush" : true
}
}
}
'

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:42 AM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello Joerg ,

Your config doesnt seem to work.
I gave the following parameter and while i was doing some inserts ,
there was no unusual behavior. The head showed the total number of
documents i had inserted and it was searchable.

index.translog.disable_flush : true

ES version - 0.90.9

Is there something i missed out ?

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:12 AM, vineeth mohan <vm.vineethmohan@gmail.com

wrote:

Hello Joerg ,

I was still thinking how well will this handle cases where i have like
10 Million to insert in the translog and i ask ES to index them all in a
single flush.
Is a heap dump likely to happen.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 1:08 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hello Joerg ,

So if i disable it , ES wont write the feeds to lucene until i make a
manual flush...
I believe translog is written to a file and its not resident in the
memory.
This also means that translogs are maintained between restarts and we
will never loose data.

If all the above are right , then this might be a good candidate for my
purpose.

Thanks
Vineeth

On Mon, Feb 24, 2014 at 12:54 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Oops, the correct parameter is index.translog.disable_flush : true

index.gateway.local.flush: -1 is controlling the gateway.

Jörg

On Sun, Feb 23, 2014 at 8:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Yes, it is possible to disable the translog sync (the component where
the operations are passed from ES to Lucene) with
index.gateway.local.flush: -1 and use the flush action for "manual commit"
instead.

I have never done that practically, though.

Jörg

On Sun, Feb 23, 2014 at 5:42 PM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hello Michael - Thanks for the configuration.

Hello Jörg - I was thinking more in lines of translog -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-translog.html

I believe the index operation is first written to translog ( Which i
am not sure if is a part of lucene ) and then written to lucene later.
Here if we can ask ES , to accumulate a huge amount of feeds to
index and index it later , will that do the trick ?

Thanks
Vineeth

On Sun, Feb 23, 2014 at 7:03 PM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Also, if there are no other clients wanting a faster refresh, you
can set index.refresh_interval to a higher value than the 1s default either
in general for your index or just during the times when you're doing your
bulk updates.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html

On Sun, Feb 23, 2014 at 8:28 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Best method to achieve this would be to implement this in front of
ES so the bulk indexing client runs only at the time it should run.

For the gathering plugin which I am working on, I plan to separate
the two phases of gathering documents and indexing documents. So, by giving
a scheduling option, it will be possible to index (or even reindex)
gathered documents at a later time, for example, documents are continuously
collected from various sources, like JDBC, web, or file system, and then
indexed at some later time (for example at night). Such collected documents
will be stored in an archive format at each gatherer node, like the archive
formats supported in the knapsack plugin.

Jörg

On Sun, Feb 23, 2014 at 6:52 AM, vineeth mohan <
vm.vineethmohan@gmail.com> wrote:

Hi ,

I am doing a lot of bulk insert into Elasticsearch and at the
same time doing lots of read in another index.

Because of the bulk insert my searches on other index are slow.

It is not very urgent that these bulk indexes actually gets
indexed and are immediately searchable.

Is there anyway , I can ask Elasticsearch to receive the bulk
inserts but do the actual indexing ( Which should be the CPU consuming part
) later.

I figured out that Elasticsearch would wait for 1 second before
making the documents searchable.
Here , what is it waiting for ? Is it to index the document or
reopening the indexWriter ?
Will it help me if i can configure this 1 second to 1 hour ?
If so , which parameter should i tweak.

Kindly let me know if there are any other similar features out
there which can be of any help.

Thanks
Vineeth

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kwxwB%2Bi%3DHZDS1y%2B6Ad-VTax8hLSpgSVaSNH7CbzagB3Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFFewVHjeoEyZVktYEEqtbBXoD4VH3K-Tx9KAh%3DTfj%3D1Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnCq-PE%3Du0ZSC6d7rDxME%3DpkzpBo%3D9-tq_rT%2BCZjQgzFxg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DMvTYj0amH46nkm%3DkAEZ6HS2yaAYX5fadS7vaY6cmRvw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtevuu0%2BdrH1x%2B1JfPbG0brZYm503Fe96qNRjoeg8uSg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kZi1hPpEoKGif3%3D433Zc0H0EO2Veo5pJqwOqvq7Qrvmw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH9Yor0t0zjMniBwoy6EAzCuEA73f-8ysZUGaQm9tPdHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #12