Tuning the refresh?

I use ES to index the fields of our document, a document has 100 fields and
non of the fields are analyzed.
I run into some performance issue while indexing. I use ES java library and
call BulkRequestBuilder with 200 documents per batch.
It was fine before but now I'm seeing some significant delay.
From ES log I see:

12411:[2012-08-31 11:53:50,532][TRACE][index.shard.service ] [Jaspers,
James] [dms][2] index [Document<stored,binary,omitNorms,indexOptions=D
12412:[2012-08-31 11:53:50,532][TRACE][index.shard.service ] [Jaspers,
James] [dms][3] index [Document<stored,binary,omitNorms,indexOptions=D
12413:[2012-08-31 11:53:50,532][TRACE][index.shard.service ] [Jaspers,
James] [dms][2] index [Document<stored,binary,omitNorms,indexOptions=D
12414:[2012-08-31 11:53:50,533][TRACE][index.shard.service ] [Jaspers,
James] [dms][3] index [Document<stored,binary,omitNorms,indexOptions=D
....and lots line like those above..then

7040:[2012-08-31 11:52:14,129][TRACE][index.shard.service ] [Jaspers, James] [dms][0] refresh with waitForOperations[false]
7041:[2012-08-31 11:52:14,787][TRACE][index.shard.service ] [Jaspers, James] [dms][2] refresh with waitForOperations[false]
7042:[2012-08-31 11:52:14,890][TRACE][index.shard.service ] [Jaspers, James] [dms][1] refresh with waitForOperations[false]

....and it waits for 4-5 seconds

That's the performance issue that I'm trying to figure out.
When that is happening cpu usage is very high, I have 2 clients that do that and can clearly see 2 cores are 99% utilized.
ES is run with 4GB heap, no major GC at all (I'm good at looking into gc log).
The server has 8 cores and plenty of free memory, no significant disk i/o wait, and no swap in/out.

Any clues to what I should be doing?