Flush API and Garbage Collection

I am trying to understand Flush API and some JVM related issues I am seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The flush
process of an index basically frees memory from the index by flushing data
to the index storage and clearing the internal transaction log. By default,
ElasticSearch uses memory heuristics in order to automatically trigger
flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything
transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

It means it cleans out the TL for that specified index and gives it up for
GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 December 2013 23:38, Vaidik Kapoor kapoor.vaidik@gmail.com wrote:

I am trying to understand Flush API and some JVM related issues I am
seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The
flush process of an index basically frees memory from the index by flushing
data to the index storage and clearing the internal transaction log. By
default, Elasticsearch uses memory heuristics in order to automatically
trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything
transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Okay thanks for clearing that out, Mark.

So here is what I am noticing and it is giving me a lot of trouble. I have
two nodes with 32 GB RAM, out of which half is allocated to ES. I am just
building out something so I wanted to see how fast can I write to ES, so at
the moment I am just indexing in ES and not querying at all. After a couple
of hours, I saw the heap usage to be about 97% and the GC was taking really
long to run (in the order of many seconds) and was running frequently
without really freeing much memory out of the heap for reuse. Then I
stopped indexing and was doing nothing. Then I manually the flushed the
index and waited for GC to free up some memory. Sadly, that's not what I
observed.

Since I am new to ES, and after having read whatever I could so far, I am
not able to understand what else might ES be using the heap for, especially
when I am not indexing anything, not using the nodes for querying as well
and have manually flushed the index using the Flush API. What else might be
causing such high usage of heap memory?

This concerns me because the write speed drastically drops in such
situations.

Any help would be appreciated.

Vaidik Kapoor
vaidikkapoor.info

On 3 December 2013 23:21, Mark Walkom markw@campaignmonitor.com wrote:

It means it cleans out the TL for that specified index and gives it up for
GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 December 2013 23:38, Vaidik Kapoor kapoor.vaidik@gmail.com wrote:

I am trying to understand Flush API and some JVM related issues I am
seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The
flush process of an index basically frees memory from the index by flushing
data to the index storage and clearing the internal transaction log. By
default, Elasticsearch uses memory heuristics in order to automatically
trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything
transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5nv7%2BuXHmQkDe73xN16ci%3D-hRz99NV8pJBWgmhUugwS%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

The smallest part of a shard is a segment and lucene caches data at that
level, which is likely to be what you are seeing residing in your heap. ES
does aggressively cache data so that queries are as fast as possible.

(This is obviously dependent on your data set size.)

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 December 2013 06:33, Vaidik Kapoor kapoor.vaidik@gmail.com wrote:

Okay thanks for clearing that out, Mark.

So here is what I am noticing and it is giving me a lot of trouble. I have
two nodes with 32 GB RAM, out of which half is allocated to ES. I am just
building out something so I wanted to see how fast can I write to ES, so at
the moment I am just indexing in ES and not querying at all. After a couple
of hours, I saw the heap usage to be about 97% and the GC was taking really
long to run (in the order of many seconds) and was running frequently
without really freeing much memory out of the heap for reuse. Then I
stopped indexing and was doing nothing. Then I manually the flushed the
index and waited for GC to free up some memory. Sadly, that's not what I
observed.

Since I am new to ES, and after having read whatever I could so far, I am
not able to understand what else might ES be using the heap for, especially
when I am not indexing anything, not using the nodes for querying as well
and have manually flushed the index using the Flush API. What else might be
causing such high usage of heap memory?

This concerns me because the write speed drastically drops in such
situations.

Any help would be appreciated.

Vaidik Kapoor
vaidikkapoor.info

On 3 December 2013 23:21, Mark Walkom markw@campaignmonitor.com wrote:

It means it cleans out the TL for that specified index and gives it up
for GC.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 December 2013 23:38, Vaidik Kapoor kapoor.vaidik@gmail.com wrote:

I am trying to understand Flush API and some JVM related issues I am
seeing.

The Flush API in the guide says:
The flush API allows to flush one or more indices through an API. The
flush process of an index basically frees memory from the index by flushing
data to the index storage and clearing the internal transaction log. By
default, Elasticsearch uses memory heuristics in order to automatically
trigger flush operations as required in order to clear memory.

Does this mean that if I make a Flush request manually, everything
transaction log related is ready for garbage collection?

Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5kB6aE0OzHFjGeOm5CfknKTRhtnyfQvfL0bKTFECZZhkA%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YEwLj9gbHr%3DC2704rnnsfuSsjSdnqGu0ez9837xo_BFg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5nv7%2BuXHmQkDe73xN16ci%3D-hRz99NV8pJBWgmhUugwS%2BA%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a8mAehVpKCUfvBMu3oWFctM%2BjBvEiaJq0Uks12fws6eg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

You do not mention the ES version, also not the heap size you use, and the
data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to
do with flush or translog. Most probably it is the segment merging. After a
few hours of constant indexing your segments grow larger and larger, and
the re-loading of segments allocates the heap.

Note, the default segment maximum merge setting is 5G. It means, segments
may grow up to this size and loaded into the heap for merging. In bad
cases, it may take a long time, long enough for nodes to disconnect from
the cluster, not being able to report to other nodes to the heartbeat
signal.

You should try streamlining your indexing by choosing smaller maximum
segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The more
shards, the longer it takes before segments get big. But, more shards also
mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

before tuning, knowing what is in the heap would be handy along with its
size. You can use the monitoring APIs to gather more information while the
heap is filling during indexing... Also there might be log entries about
slow garbage collections.

--Alex

On Wed, Dec 4, 2013 at 2:33 AM, joergprante@gmail.com <joergprante@gmail.com

wrote:

You do not mention the ES version, also not the heap size you use, and the
data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to
do with flush or translog. Most probably it is the segment merging. After a
few hours of constant indexing your segments grow larger and larger, and
the re-loading of segments allocates the heap.

Note, the default segment maximum merge setting is 5G. It means, segments
may grow up to this size and loaded into the heap for merging. In bad
cases, it may take a long time, long enough for nodes to disconnect from
the cluster, not being able to report to other nodes to the heartbeat
signal.

You should try streamlining your indexing by choosing smaller maximum
segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The
more shards, the longer it takes before segments get big. But, more shards
also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Ehwo_70YLyHAfsakrO8kqDNrNWBwDKd5PXxfPRbHhcw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

=> You can also try experimenting with the number of shards per node. The
more shards, the longer it takes before segments get big. But, more shards
also mean more resource consumption per node.

What are the resource consumption on per nodes? Any good indicator (e.g.
api or monitoring tools)?

/Jason

On Wed, Dec 4, 2013 at 9:33 AM, joergprante@gmail.com <joergprante@gmail.com

wrote:

You do not mention the ES version, also not the heap size you use, and the
data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to
do with flush or translog. Most probably it is the segment merging. After a
few hours of constant indexing your segments grow larger and larger, and
the re-loading of segments allocates the heap.

Note, the default segment maximum merge setting is 5G. It means, segments
may grow up to this size and loaded into the heap for merging. In bad
cases, it may take a long time, long enough for nodes to disconnect from
the cluster, not being able to report to other nodes to the heartbeat
signal.

You should try streamlining your indexing by choosing smaller maximum
segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The
more shards, the longer it takes before segments get big. But, more shards
also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itx7rqE2E986pJRZPSxG29Zw9V1ZBA%2B8b6AOLbAiu6kh4w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

For monitoring you can use ElasticHQ, kopf or bigdesk.
These all take API output and turn it into something a little more
digestible.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 4 December 2013 20:33, Jason Wee peichieh@gmail.com wrote:

Hi Jörg,

=> You can also try experimenting with the number of shards per node. The
more shards, the longer it takes before segments get big. But, more shards
also mean more resource consumption per node.

What are the resource consumption on per nodes? Any good indicator (e.g.
api or monitoring tools)?

/Jason

On Wed, Dec 4, 2013 at 9:33 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

You do not mention the ES version, also not the heap size you use, and
the data volume you handle when indexing. So it is not easy to help.

Anyway, from the situation you describe, what you observe has not much to
do with flush or translog. Most probably it is the segment merging. After a
few hours of constant indexing your segments grow larger and larger, and
the re-loading of segments allocates the heap.

Note, the default segment maximum merge setting is 5G. It means, segments
may grow up to this size and loaded into the heap for merging. In bad
cases, it may take a long time, long enough for nodes to disconnect from
the cluster, not being able to report to other nodes to the heartbeat
signal.

You should try streamlining your indexing by choosing smaller maximum
segment sizes. Example:

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

You can also try experimenting with the number of shards per node. The
more shards, the longer it takes before segments get big. But, more shards
also mean more resource consumption per node.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbkVDuoFkhdFA1mJsDOpX7bu-ru-v4HmphVDdbhTio4Q%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHO4itx7rqE2E986pJRZPSxG29Zw9V1ZBA%2B8b6AOLbAiu6kh4w%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624afYaG3hEGJz4Vrha%3D8FQjVkt0qBMALsJyfmjYa%2B6aeeg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.