I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.
When I start my logstash -> ElasticSearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.
After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.
I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.
When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.
After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.
I'm seeing lots of I/O wait (40-75%) on the nodes. Increasing to
ES_MAX_MEM="8g" ES_HEAP_SIZE="8g" has made the bursts of activity on
disc more productive. This is all happening on a shared storage system
with all of our VM data on it, mounted to an ESXi as a VM disk over
NFS.
I'm able to just barely keep up with input now, but is I/O wait like
that normal for ES? Would I be a whole ton better off with local
storage?
I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.
When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.
After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.
I think you would be better off with local storage, yes. I'm having
the same problem, with slow inserts from VMs with shared storage.
I'm not sure what can be done from the ES side, especially since other
databases seem to suffer as well in the same conditions. Which kind of
makes sense: you have to dump your changes to disk every once in a
while.
There might be some settings to optimize this, but I'm not aware of
one that helps. All I found was here:
If you find any solution, please write it here as well. I would really
appreciate it
I'm seeing lots of I/O wait (40-75%) on the nodes. Increasing to
ES_MAX_MEM="8g" ES_HEAP_SIZE="8g" has made the bursts of activity on
disc more productive. This is all happening on a shared storage system
with all of our VM data on it, mounted to an ESXi as a VM disk over
NFS.
I'm able to just barely keep up with input now, but is I/O wait like
that normal for ES? Would I be a whole ton better off with local
storage?
I've got a two-node ES cluster running on two four-core HPs and
feeding data into a pretty fast datastore.
When I start my logstash -> Elasticsearch pipeline, I'm able to get
about 1200 documents indexed per second. This is far in excess of the
speed at which they're coming in over the AMQP pipe, so it's
sufficient for my needs.
After a few hours of running, however, performance drops to a mere
60-80 documents indexed per second. Why does this happen, and what can
I do to fix it? Emptying out my data.path location speeds things back
up immediately, but wipes everything out in the process, of course.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.