Bulk Loading Performance slows to a crawl


(derrickburns) #1

I am trying to index 80M tweets with a bulk loader that I wrote using
the Java API on 0.16.4.

When I get to about 2M tweets, indexing performance drops from 4K
indexed tweets per second to under 300 tweets per second. CPU load
goes from 30% to 3%. Heap memory bounces between 300MB and 600Mb and
then spikes to 900Mb.

Here are my index settings:

		index.engine.robin.refresh_interval, 10
		indices.memory.index_buffer_size, 0.50
 		index.number_of_shards, 4
		index.number_of_replicas, 0
		index.merge.policy.merge_factor, 30
		index.merge.policy.use_compound_file, false;
		index.refresh_interval, "-1"

I am doing this on a index I just created on a local node.

I am running on a new quad-core i7 Macbook Pro.

Ideas???


(system) #2