Jagged Index Request Rate

I'm using ES 1.3 to ingest information from multiple ( 25 ) servers. Our
servers are indexing a single document at a time using the Java API,
totally about 100-200 documents per second depending on the time. In
Marvel, I see the index requests as a very jagged graph
( https://www.dropbox.com/s/6snw2wxbv264c5w/Screenshot%202014-11-13%2009.42.47.png?dl=0
) , and I'm trying to isolate the bottleneck. I'm pretty confident that
data is coming in a more smooth line, so I'm not sure if there is an
indexing bottleneck in my servers, or in my ES cluster. On each of the ES
nodes, there isn't more than a 0.1 Load and about 20IOPS, so I don't think
there is a bottleneck on the ES cluster, but I can't figure out where it
would be on the client either.

Is 10 index a second too much for a single ElasticSearch Java Client?
Should I consider switching to bulk inserts? I would image the java client
could do much more throughput than that. Just want to make sure that i'm
not losing any data anywhere.

Thanks for the help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b1d076f-69bd-47c9-86de-dd6c419b23b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

"10 index a second" is not too much.

Note, the jagged rate may also result from a skew in the measurement
intervals, so it may be artificial and no need to worry too much.

Bulk is definitely the way to go. With bulk mode, you can index around
10k-20k docs per second (avg size of 1k) from a single machine to another
single machine, using concurrency.

You can not lose data with correct coding, ES will reject requests with
exceptions if too heavy loaded automatically. In that case, close the
client immediately, and restart the indexing from the beginning with a more
modest configuration.

Jörg

On Thu, Nov 13, 2014 at 6:55 PM, Mike Seid mike@naytev.com wrote:

I'm using ES 1.3 to ingest information from multiple ( 25 ) servers. Our
servers are indexing a single document at a time using the Java API,
totally about 100-200 documents per second depending on the time. In
Marvel, I see the index requests as a very jagged graph (
https://www.dropbox.com/s/6snw2wxbv264c5w/Screenshot%202014-11-13%2009.42.47.png?dl=0
) , and I'm trying to isolate the bottleneck. I'm pretty confident that
data is coming in a more smooth line, so I'm not sure if there is an
indexing bottleneck in my servers, or in my ES cluster. On each of the ES
nodes, there isn't more than a 0.1 Load and about 20IOPS, so I don't think
there is a bottleneck on the ES cluster, but I can't figure out where it
would be on the client either.

Is 10 index a second too much for a single ElasticSearch Java Client?
Should I consider switching to bulk inserts? I would image the java client
could do much more throughput than that. Just want to make sure that i'm
not losing any data anywhere.

Thanks for the help!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b1d076f-69bd-47c9-86de-dd6c419b23b6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5b1d076f-69bd-47c9-86de-dd6c419b23b6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFN15Cv7ecwww9KXMJE0H%2B5rHaCwJma4jDXdfRJOwE%2BUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks! This makes a lot of sense and it is pretty reaffirming to know that
data won't be lost without an error.

I'm guessing your right about the jagged index rate, as the graph is pretty
consistent across different levels of load. I'll definitely be looking into
bulk as load grows.

Cheers,

Mike

On Thursday, November 13, 2014 10:40:48 AM UTC-8, Jörg Prante wrote:

"10 index a second" is not too much.

Note, the jagged rate may also result from a skew in the measurement
intervals, so it may be artificial and no need to worry too much.

Bulk is definitely the way to go. With bulk mode, you can index around
10k-20k docs per second (avg size of 1k) from a single machine to another
single machine, using concurrency.

You can not lose data with correct coding, ES will reject requests with
exceptions if too heavy loaded automatically. In that case, close the
client immediately, and restart the indexing from the beginning with a more
modest configuration.

Jörg

On Thu, Nov 13, 2014 at 6:55 PM, Mike Seid <mi...@naytev.com <javascript:>

wrote:

I'm using ES 1.3 to ingest information from multiple ( 25 ) servers. Our
servers are indexing a single document at a time using the Java API,
totally about 100-200 documents per second depending on the time. In
Marvel, I see the index requests as a very jagged graph (
https://www.dropbox.com/s/6snw2wxbv264c5w/Screenshot%202014-11-13%2009.42.47.png?dl=0
) , and I'm trying to isolate the bottleneck. I'm pretty confident that
data is coming in a more smooth line, so I'm not sure if there is an
indexing bottleneck in my servers, or in my ES cluster. On each of the ES
nodes, there isn't more than a 0.1 Load and about 20IOPS, so I don't think
there is a bottleneck on the ES cluster, but I can't figure out where it
would be on the client either.

Is 10 index a second too much for a single ElasticSearch Java Client?
Should I consider switching to bulk inserts? I would image the java client
could do much more throughput than that. Just want to make sure that i'm
not losing any data anywhere.

Thanks for the help!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5b1d076f-69bd-47c9-86de-dd6c419b23b6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5b1d076f-69bd-47c9-86de-dd6c419b23b6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fb5e71b-9047-4589-90e6-f266382a181b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.