Disk usage of gateway

Andrew_Degtiariov · September 24, 2010, 11:09am

Hello!

Today I'm receive an alert about no free space on 100Gb EBS partition
which I use as "work" directory for single node ES installation.
98Gb was used by gateway. Data for ES produced from MongoDB and takes
all with indexes and other data which is not indexed in ES only 11Gb.
I have moved all data of gateway to /mnt partition (it have 500Gb) and
start ES again but shutting down ES when it hold more 30 minutes in
recovery state.
For example full indexation takes only 25 minutes and now gateway
occupy only 22 Gb (21 Gb after optimizing of indexes).

Is there a way to get clean up gateway? And is there a way to decrease
time of index recovery?

--
Andrew Degtiariov
DA-RIPE

kimchy · September 24, 2010, 5:02pm

The gateway should get cleaned up automatically. Are you storing both the
gateway and the work dir in the same location?

On Fri, Sep 24, 2010 at 1:09 PM, Andrew Degtiariov <
andrew.degtiariov@gmail.com> wrote:

Hello!

Today I'm receive an alert about no free space on 100Gb EBS partition
which I use as "work" directory for single node ES installation.
98Gb was used by gateway. Data for ES produced from MongoDB and takes
all with indexes and other data which is not indexed in ES only 11Gb.
I have moved all data of gateway to /mnt partition (it have 500Gb) and
start ES again but shutting down ES when it hold more 30 minutes in
recovery state.
For example full indexation takes only 25 minutes and now gateway
occupy only 22 Gb (21 Gb after optimizing of indexes).

Is there a way to get clean up gateway? And is there a way to decrease
time of index recovery?

--
Andrew Degtiariov
DA-RIPE

ppearcy · September 24, 2010, 5:35pm

We noticed that our work/gateway sizes were ballooning and it appeared
that the "flush" command was not getting executed often enough. From
the ES docs, this gets executed based on memory heuristics. Not sure
what type of memory (disk or ram) or what the thresholds are, though.

We ended up calling flush after every 10K docs.

Regards,
Paul

On Sep 24, 11:02 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The gateway should get cleaned up automatically. Are you storing both the
gateway and the work dir in the same location?

On Fri, Sep 24, 2010 at 1:09 PM, Andrew Degtiariov <

andrew.degtiar...@gmail.com> wrote:

Hello!

Today I'm receive an alert about no free space on 100Gb EBS partition
which I use as "work" directory for single node ES installation.
98Gb was used by gateway. Data for ES produced from MongoDB and takes
all with indexes and other data which is not indexed in ES only 11Gb.
I have moved all data of gateway to /mnt partition (it have 500Gb) and
start ES again but shutting down ES when it hold more 30 minutes in
recovery state.
For example full indexation takes only 25 minutes and now gateway
occupy only 22 Gb (21 Gb after optimizing of indexes).

Is there a way to get clean up gateway? And is there a way to decrease
time of index recovery?

--
Andrew Degtiariov
DA-RIPE

Clinton_Gormley · September 24, 2010, 5:53pm

On Fri, 2010-09-24 at 10:35 -0700, Paul wrote:

We noticed that our work/gateway sizes were ballooning and it appeared
that the "flush" command was not getting executed often enough. From
the ES docs, this gets executed based on memory heuristics. Not sure
what type of memory (disk or ram) or what the thresholds are, though.

We ended up calling flush after every 10K docs.

By default, flush is called every 5000 operations - this is on a per
shard basis.

This number can be controlled by setting:

index.translog.flush_threshold

I can't find docs for this, but see this message:
http://groups.google.com/a/elasticsearch.com/group/users/msg/06d62ea3ceb4db30

clint

ppearcy · September 24, 2010, 6:49pm

Thanks Clinton! That is good to know. So, I if you have lots of
indexes, you might want to lower this default.

On Sep 24, 11:53 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Fri, 2010-09-24 at 10:35 -0700, Paul wrote:

We noticed that our work/gateway sizes were ballooning and it appeared
that the "flush" command was not getting executed often enough. From
the ES docs, this gets executed based on memory heuristics. Not sure
what type of memory (disk or ram) or what the thresholds are, though.

We ended up calling flush after every 10K docs.

By default, flush is called every 5000 operations - this is on a per
shard basis.

This number can be controlled by setting:

index.translog.flush_threshold

I can't find docs for this, but see this message:http://groups.google.com/a/elasticsearch.com/group/users/msg/06d62ea3...

clint

kimchy · September 24, 2010, 7:04pm

There is a problem in 0.10 where the translog in the gateway was not being
appended correctly and data was getting accumelated each time instead of
adding just the diff. This does not affect the correctness of the gateway,
but does imply more storage needs. I have fixed in in 0.11.

On Fri, Sep 24, 2010 at 8:49 PM, Paul ppearcy@gmail.com wrote:

Thanks Clinton! That is good to know. So, I if you have lots of
indexes, you might want to lower this default.

On Sep 24, 11:53 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Fri, 2010-09-24 at 10:35 -0700, Paul wrote:

We noticed that our work/gateway sizes were ballooning and it appeared
that the "flush" command was not getting executed often enough. From
the ES docs, this gets executed based on memory heuristics. Not sure
what type of memory (disk or ram) or what the thresholds are, though.

We ended up calling flush after every 10K docs.

By default, flush is called every 5000 operations - this is on a per
shard basis.

This number can be controlled by setting:

index.translog.flush_threshold

I can't find docs for this, but see this message:
http://groups.google.com/a/elasticsearch.com/group/users/msg/06d62ea3...

clint

Andrew_Degtiariov · September 29, 2010, 1:08pm

On Fri, Sep 24, 2010 at 8:02 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The gateway should get cleaned up automatically. Are you storing both the
gateway and the work dir in the same location?

Sorry for delayed reply.
Yes, we are store both gateway and work dir in the same location. After
moving work dir (with gateway) to 500Gb partion ES was fill it in 3 days.
So I was force to disable gateway (and full indexation in my case in 3x
times faster then recovering from gateway).

--
Andrew Degtiariov
DA-RIPE