Indexing slows down dramatically as index size grows

Gregory_Rice · April 18, 2012, 6:13pm

Hey guys,

I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.

When I start my logstash -> ElasticSearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.

After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.

Thanks,
Greg Rice

Radu_Gheorghe1 · April 19, 2012, 6:51am

Hi Greg,

How much memory do you have allocated for ES? The rule of thumb would
be around half the total amount of RAM.

Also, can you check the logs to see if you run into a max-open-files
limit?

Also, it might be useful to take a look at this thread:
http://elasticsearch-users.115913.n3.nabble.com/Problems-with-GrayLog2-ES-setup-long-td3859362.html

I suppose that optimizations for Graylog would also apply to Logstash.

On Apr 18, 9:13 pm, Greg Rice gregr...@gmail.com wrote:

Hey guys,

I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.

When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.

After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.

Thanks,
Greg Rice

Gregory_Rice · April 19, 2012, 7:07pm

Radu,

I'm seeing lots of I/O wait (40-75%) on the nodes. Increasing to
ES_MAX_MEM="8g" ES_HEAP_SIZE="8g" has made the bursts of activity on
disc more productive. This is all happening on a shared storage system
with all of our VM data on it, mounted to an ESXi as a VM disk over
NFS.

I'm able to just barely keep up with input now, but is I/O wait like
that normal for ES? Would I be a whole ton better off with local
storage?

Thanks,
Greg Rice

On Apr 18, 11:51 pm, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Greg,

How much memory do you have allocated for ES? The rule of thumb would
be around half the total amount of RAM.

Also, can you check the logs to see if you run into a max-open-files
limit?

Also, it might be useful to take a look at this thread:http://elasticsearch-users.115913.n3.nabble.com/Problems-with-GrayLog...

I suppose that optimizations for Graylog would also apply to Logstash.

On Apr 18, 9:13 pm, Greg Rice gregr...@gmail.com wrote:

Hey guys,

I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.

When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.

After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.

Thanks,
Greg Rice

Radu_Gheorghe1 · April 20, 2012, 6:55am

Hi Gregory,

I think you would be better off with local storage, yes. I'm having
the same problem, with slow inserts from VMs with shared storage.

I'm not sure what can be done from the ES side, especially since other
databases seem to suffer as well in the same conditions. Which kind of
makes sense: you have to dump your changes to disk every once in a
while.

There might be some settings to optimize this, but I'm not aware of
one that helps. All I found was here:

If you find any solution, please write it here as well. I would really
appreciate it

Best regards,
Radu

On Apr 19, 10:07 pm, Gregory Rice gregr...@gmail.com wrote:

Radu,

I'm seeing lots of I/O wait (40-75%) on the nodes. Increasing to
ES_MAX_MEM="8g" ES_HEAP_SIZE="8g" has made the bursts of activity on
disc more productive. This is all happening on a shared storage system
with all of our VM data on it, mounted to an ESXi as a VM disk over
NFS.

I'm able to just barely keep up with input now, but is I/O wait like
that normal for ES? Would I be a whole ton better off with local
storage?

Thanks,
Greg Rice

On Apr 18, 11:51 pm, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Greg,

How much memory do you have allocated for ES? The rule of thumb would
be around half the total amount of RAM.

Also, can you check the logs to see if you run into a max-open-files
limit?

Also, it might be useful to take a look at this thread:http://elasticsearch-users.115913.n3.nabble.com/Problems-with-GrayLog...

I suppose that optimizations for Graylog would also apply to Logstash.

On Apr 18, 9:13 pm, Greg Rice gregr...@gmail.com wrote:

Hey guys,

I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.

When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.

After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.

Thanks,
Greg Rice

Topic		Replies	Views
Indexing performance terrible after upgrading from 1.6 to 2.4 Elasticsearch	2	472	July 5, 2017
Cluster from virtual machines Elasticsearch	5	770	July 5, 2017
Cluster (ES 5.2) performance degrading after indexing Elasticsearch	3	508	June 6, 2017
Slow index speed for larger amounts of data Elasticsearch	1	364	July 6, 2017
Slow first request on an index after a short amount of time Elasticsearch	6	9899	March 13, 2020

Indexing slows down dramatically as index size grows

Related topics