Hey,
I asked about this long ago and got no traction so I thought let me try
again but with new info. We are trying to import ~400m docs from couchdb
to ES using the couchdb-river. Things start out super fast (2,000 docs/s
according to bigdesk) but after a few hours (4-12 hr) it will fall down to
50 docs/s. Restarting ES and boom back to the fast indexing. I am using
the following settings for couchdb river
"bulk_size": "2500",
"bulk_timeout": "40ms"
And I have tuned (i think) ES to be as fast as possible for the indexing.
I have followed the info
here: http://www.elasticsearch.org/blog/2011/03/23/update-settings.html
already and set number_of_shards: 20, number_of_replicas:
0, indices.memory.index_buffer_size: 65 in attempts to help the indexing.
The box is also pretty beefy, its a m1.xlarge with a 2000IOPS EBS mounted
to it and I am using the ES cookbook so it already setup the HEAP
correctly. Watching things like disk and cpu have power to spare.
Any idea what I am missing? I am pretty sure others have used the
couchdb-river to import large number of docs at decent speed.
Zuhaib
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.