Hey,
I am trying to migrate (copy) 35 million documents (which is a standard
amount, not too big) between couchbase to elasticsearch.
My elasticsearch cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS
Severs on Microsoft Azure (each server equals to a large server on Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index
represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is becoming
very slow (about 10 documents per second) and the load average of each
server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the cpu
shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be aware
about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per
second.
is the indexing time depends on how big is the index? and should it be
that slow?
I also open a thread in stackover flow if anyone want to keep update:
Follow ES's advice on node setup, e.g. allocate 50% of the available
memory size to the Java heap of ES, don't run anything else on that
machine, and disable swappiness.
Your index is already sharded, try spreading it out to 3 different
servers instead of having them on one server ("virtual shards"). This will
help fan out the indexing load.
If you don't specify the document IDs yourself, make sure you use the
latest ES, there's a significant improvement there in the ID generation
mechanism which could help speeding up things.
On Sun, Sep 14, 2014 at 11:38 AM, Niv Penso nivp@toonimo.com wrote:
Hey,
I am trying to migrate (copy) 35 million documents (which is a standard
amount, not too big) between couchbase to elasticsearch.
My elasticsearch cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS
Severs on Microsoft Azure (each server equals to a large server on Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index
represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is
becoming very slow (about 10 documents per second) and the load average of
each server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the
cpu shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be aware
about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per
second.
is the indexing time depends on how big is the index? and should it be
that slow?
Follow ES's advice on node setup, e.g. allocate 50% of the available
memory size to the Java heap of ES, don't run anything else on that
machine, and disable swappiness.
Your index is already sharded, try spreading it out to 3 different
servers instead of having them on one server ("virtual shards"). This will
help fan out the indexing load.
If you don't specify the document IDs yourself, make sure you use the
latest ES, there's a significant improvement there in the ID generation
mechanism which could help speeding up things.
On Sun, Sep 14, 2014 at 11:38 AM, Niv Penso <ni...@toonimo.com
<javascript:>> wrote:
Hey,
I am trying to migrate (copy) 35 million documents (which is a standard
amount, not too big) between couchbase to elasticsearch.
My elasticsearch cluster composed from 3 A3 (4 cores, 7 GB memory) CentOS
Severs on Microsoft Azure (each server equals to a large server on Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index
represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is
becoming very slow (about 10 documents per second) and the load average of
each server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the
cpu shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be
aware about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per
second.
is the indexing time depends on how big is the index? and should it be
that slow?
Follow ES's advice on node setup, e.g. allocate 50% of the available
memory size to the Java heap of ES, don't run anything else on that
machine, and disable swappiness.
Your index is already sharded, try spreading it out to 3 different
servers instead of having them on one server ("virtual shards"). This will
help fan out the indexing load.
If you don't specify the document IDs yourself, make sure you use the
latest ES, there's a significant improvement there in the ID generation
mechanism which could help speeding up things.
On Sun, Sep 14, 2014 at 11:38 AM, Niv Penso ni...@toonimo.com wrote:
Hey,
I am trying to migrate (copy) 35 million documents (which is a standard
amount, not too big) between couchbase to elasticsearch.
My elasticsearch cluster composed from 3 A3 (4 cores, 7 GB memory)
CentOS Severs on Microsoft Azure (each server equals to a large server on
Amazon)..
I used "timing data flow" indexing to store the docuemnts. each index
represents a month and composed by 3 shards and 2 replicas.
when i start the migration script i see that the insertion time is
becoming very slow (about 10 documents per second) and the load average of
each server in the cluster jumping over than 1.5.
In addition, the JVM memory is being increased almost to 100% while the
cpu shows 20% and the IOps shows 20 at max.
(i used Marvel CNC to get all these data)
Does anyone faced these kind of indexing problems in elasticsearch?
I would like to know if there are any parameters that i should be
aware about to extend java memory?
is my cluster specifications good enough to handle 100 indexing per
second.
is the indexing time depends on how big is the index? and should it
be that slow?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.