Why is ES memory consumption raising while I bulk index parent/child documents?

I am indexing parent/child documents at 1500/second rate, my index size is
57million records combined (currently as I write this) and I do not perform
any search while I index

My memory consumption is raising steady while I index, speed of 1 gig per
20 minutes
What is the reason for this, if I shutdown and restart It drops and rises
again

The indexing speed is reduced after some time

I would like to control and maintain index speed, please advice

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

I think it's merginghttp://www.elasticsearch.org/guide/reference/index-modules/merge/that
lowers your indexing speed as the index gets bigger, not the increased
memory.

At one moment, your memory should get freed up by the Garbage Collector, so
the steady increase feels normal up to a point.

If you need to maintain the indexing speed, the solution depends on how
your data looks like. For time-rolling data, such as logs, a good thing
would be to have time-based indices (eg: one per day). This way, you have
some control over how big your indices will become.

On Thu, Jun 20, 2013 at 11:38 AM, David MZ david.mazvovsky@gmail.comwrote:

I am indexing parent/child documents at 1500/second rate, my index size is
57million records combined (currently as I write this) and I do not perform
any search while I index

My memory consumption is raising steady while I index, speed of 1 gig per
20 minutes
What is the reason for this, if I shutdown and restart It drops and rises
again

The indexing speed is reduced after some time

I would like to control and maintain index speed, please advice

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"At one moment, your memory should get freed up by the Garbage Collector,
so the steady increase feels normal up to a point."

You mean that the increase will be capped?

The nature of my data is data about a user, all of the documents are almost
the same size, but there are 150 fields per document but I do not analyze
them as they are one word fields.
I played with merging policy, set the number of concurrent merges to 2

Are there more settings to set to improve performance

The size of the data set is 180 million from which I generate 100 million
parents

On Thu, Jun 20, 2013 at 11:48 AM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hi David,

I think it's merginghttp://www.elasticsearch.org/guide/reference/index-modules/merge/that lowers your indexing speed as the index gets bigger, not the increased
memory.

At one moment, your memory should get freed up by the Garbage Collector,
so the steady increase feels normal up to a point.

If you need to maintain the indexing speed, the solution depends on how
your data looks like. For time-rolling data, such as logs, a good thing
would be to have time-based indices (eg: one per day). This way, you have
some control over how big your indices will become.

On Thu, Jun 20, 2013 at 11:38 AM, David MZ david.mazvovsky@gmail.comwrote:

I am indexing parent/child documents at 1500/second rate, my index size
is 57million records combined (currently as I write this) and I do not
perform any search while I index

My memory consumption is raising steady while I index, speed of 1 gig per
20 minutes
What is the reason for this, if I shutdown and restart It drops and rises
again

The indexing speed is reduced after some time

I would like to control and maintain index speed, please advice

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rKjqMtgSqcE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, Jun 20, 2013 at 11:53 AM, David MZ david.mazvovsky@gmail.comwrote:

"At one moment, your memory should get freed up by the Garbage Collector,
so the steady increase feels normal up to a point."

You mean that the increase will be capped?

Normally, yes. Although garbage collecting is CPU-intensive, which might
hurt performance for that time. But you should test and see how it all
works for you.

The nature of my data is data about a user, all of the documents are
almost the same size, but there are 150 fields per document but I do not
analyze them as they are one word fields.
I played with merging policy, set the number of concurrent merges to 2

Are there more settings to set to improve performance

Indexing performance? Yes, but it depends on your use-case. Have a look at:

  • bulk indexing
  • increasing or even disabling refresh_interval during indexing
  • translog parameters
  • indices.memory.index_buffer_size

Ah, here's an interesting gist:

And you should also monitor your cluster while testing to see what's your
bottleneck. This will help you know what makes sense to improve. If you
need a monitoring tool, check our SPM:

Best regards,
Radu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Am 20.06.13 10:38, schrieb David MZ:

I am indexing parent/child documents at 1500/second rate, my index
size is 57million records combined (currently as I write this) and I
do not perform any search while I index

My memory consumption is raising steady while I index, speed of 1 gig
per 20 minutes

How much is your memory consumption? What process? How much nodes do you
have in your cluster?

What is the reason for this, if I shutdown and restart It drops and
rises again

Do you shutdown server or client? If you start a process, memory usage
will start again, nothing to worry about.

The indexing speed is reduced after some time

How much is the reduced indexing speed?

I would like to control and maintain index speed, please advice

Please give additional information about your server and client config
and the numbers you observed.

If you index millions of documents, ES will have more work to manage
growing Lucene segments and merging them, which will consume all of the
available CPU and memory, mostly observable as sudden slowdowns
("spikes"). If you throttle indexing, you can better handle the phase of
merging spikes. There are many methods you can choose from to streamline
indexing, on client or on server side.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"How much is your memory consumption? What process? How much nodes do you
have in your cluster? "

I have only one computer, so 1 node with 3 shards

"How much is the reduced indexing speed? "

The speed drops by 50% and later by 100%, sometimes I got spikes of very
bad performance, I guess it's GC or a merge

Question?

As I do not need to search the index while indexing, can I not do any
merges at all while indexing, and just do that at the end
my system is not real time at any sense, I am preparing a static index to
do read only

I have enough disk space (I think) to have it un merged

On Thursday, June 20, 2013 3:09:02 PM UTC+3, Jörg Prante wrote:

Am 20.06.13 10:38, schrieb David MZ:

I am indexing parent/child documents at 1500/second rate, my index
size is 57million records combined (currently as I write this) and I
do not perform any search while I index

My memory consumption is raising steady while I index, speed of 1 gig
per 20 minutes

How much is your memory consumption? What process? How much nodes do you
have in your cluster?

What is the reason for this, if I shutdown and restart It drops and
rises again

Do you shutdown server or client? If you start a process, memory usage
will start again, nothing to worry about.

The indexing speed is reduced after some time

How much is the reduced indexing speed?

I would like to control and maintain index speed, please advice

Please give additional information about your server and client config
and the numbers you observed.

If you index millions of documents, ES will have more work to manage
growing Lucene segments and merging them, which will consume all of the
available CPU and memory, mostly observable as sudden slowdowns
("spikes"). If you throttle indexing, you can better handle the phase of
merging spikes. There are many methods you can choose from to streamline
indexing, on client or on server side.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Am 20.06.13 15:14, schrieb David MZ:

The speed drops by 50% and later by 100%, sometimes I got spikes of
very bad performance, I guess it's GC or a merge

If indexing stops, it's not GC and not a merge. Then you did not
configure your indexing routine well enough to get it smoothly adjusted
to your system resources.
How do you index documents?

Question?

As I do not need to search the index while indexing, can I not do any
merges at all while indexing, and just do that at the end
my system is not real time at any sense, I am preparing a static index
to do read onl

Merging segments is not for searching, it is unevitable for he process
of creating an index. You can control the Lucene merge operation by

By reducing the merged segment size, you can lower the length of spikes
and take pressure from the heap if you have a small heap
(max_merged_segment = 2g instead of 5g for example)

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"If indexing stops, it's not GC and not a merge. Then you did not configure
your indexing routine well enough to get it smoothly adjusted to your
system resources.
How do you index documents?"

Indexing did not stop, it was just very slow, I use tire ruby gem to index
using bulk index operation

"By reducing the merged segment size, you can lower the length of spikes
and take pressure from the heap if you have a small heap
(max_merged_segment = 2g instead of 5g for example)"

What will happen if I set it to 500mb?

On Thu, Jun 20, 2013 at 4:22 PM, Jörg Prante joergprante@gmail.com wrote:

Am 20.06.13 15:14, schrieb David MZ:

The speed drops by 50% and later by 100%, sometimes I got spikes of very

bad performance, I guess it's GC or a merge

If indexing stops, it's not GC and not a merge. Then you did not
configure your indexing routine well enough to get it smoothly adjusted to
your system resources.
How do you index documents?

Question?

As I do not need to search the index while indexing, can I not do any
merges at all while indexing, and just do that at the end
my system is not real time at any sense, I am preparing a static index to
do read onl

Merging segments is not for searching, it is unevitable for he process of
creating an index. You can control the Lucene merge operation by
Elasticsearch Platform — Find real-time answers at scale | Elastichttp://www.elasticsearch.org/guide/reference/index-modules/merge/

By reducing the merged segment size, you can lower the length of spikes
and take pressure from the heap if you have a small heap
(max_merged_segment = 2g instead of 5g for example)

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**rKjqMtgSqcE/unsubscribehttps://groups.google.com/d/topic/elasticsearch/rKjqMtgSqcE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.