Hi all, I'm currently working on a project where elasticsearch is our
backend but have been running into issues with insert rates. Some
background is our cluster is four physical boxes, each with 32 CPU cores
and 252 gigs of RAM. Each box runs a data node, a master node and a search
node. On two other machines that have the same hardware specs we have a
java app running that pulls our data from Kafka, does some adjusting of the
data and then inserts it into Elasticsearch.
In the java app we are using the "node" style client along with the
BulkProcessor class to handle our inserts. Everything is running on
Elasticsearch 0.90.5 with Java 1.7.0_45. The issue we are running into is
we can't seem to be able to get over about 7k inserts per second per java
app (so 14k total since we have two instances of our java app running). It
seems around 6500k-7k the Elasticsearch inserts start to lag behind how
fast we're pulling the data from Kafka. Our initial thoughts were that the
"data adjusting" stage of our app was causing the latency but we've been
able to rule that out by adding some metrics around that part of the app.
Everything is fine until we reach the point where we want to do inserts. My
question is are there any other users out there pushing ~10k inserts per
second (that is our goal) using the Java API? If so would you mind sharing
some of the settings you are using? We've tried adjusting the BulkProcessor
concurrent count and bulk size but nothing seems to really improve it. One
thing I've noticed with our monitoring is that sometimes it seems like our
Elasticsearch client gets backed up or something. We'll see inserts
chugging along at 6k and then just start dropping and then after a few
seconds they start coming back up. No GCs or anything happen during this
time so I'm not sure what would be causing that.
The health of the boxes while we're running looks fine (both on the ES
nodes as well as where our app lives) and inside of the JVM everything
seems to be ok as well (no huge GCs or anything). I've searched this list
and have found people talking about doing 10k inserts per second so we know
it's totally possible, we just can't seem to get the right setup to get to
that number. Any suggestions or tips would be greatly appreciated!
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/804767e1-480e-49be-8a79-7fbf4f0ce62e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.