So your docs are about 10K each? Are you doing any kind of other
transformation on your data? The code you showed is virtually the same
as mine, but I'm not using a DataItem. My data files are JSON, one per
row. I simple read each row, set the id and source, then use
client.prepareIndex(). We index 10,000 docs at a time. THe files
contain 20M docs and the files are compressed using gz. It's basically
just as fast to read from compressed files plus you get much smaller
files to push around
On Jul 9, 8:58 pm, allwefantasy allwefant...@gmail.com wrote:
6 milliam blog articles will be indexed and whole index files are 62G .
1000 DataItems indexed at a time.
From: imarcticblue [via ElasticSearch Users]
Sent: Saturday, July 09, 2011 1:15 AM
Subject: Re: ElasticSearch bulk api performance
How large are your documents and how many are you indexing at a time? We're not using a DataItem, just raw JSON and we can index 40M records in about 3:30 at 3200 docs/sec. Average record size is .5k. This is on AWS with 2 large nodes, 32 shards, 2 replicas. Your hardware looks beefier than what we have on AWS.
If you reply to this email, your message will be added to the discussion below:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
To unsubscribe from ElasticSearch bulk api performance, click here.
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-ap...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.