Odd behavior of bulk loading speed - good riddle?

So this has me perplexed.

I have a bulk data loading job that creates an upsert statement and batches
500 of them in a bulk operation using the _bulk interface.

I send the bulk insert via HTTP (on 9200) and wait for the response before
sending the next one, which I do immediately.

I do not hit any thread pool limits.

I have replicas set to zero and refresh interval set to -1 to make the
loading as lightweight as possible.

Timing these, they start out pretty fast and run about 2000 documents per
second. Four or so HTTP round trips.

This lasts for a few minutes and then it starts to slow. Within an hour,
it's running about 1200 per second. In another hour, it's down to about 600
per second. Then it seems to flatten-out about 400 per second until the job
is done, some 8 million documents later.

So my question is - why the slowdown? It's very consistent, seems
reasonably linear, and happens 100% of the time.

Any clues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a787d461-f467-4f79-943b-e65e12492783%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The statement, if that helps (this is a line of PHP, hence the $ variables):

"{"script" : "ctx._source.auctionid=$auctionID;
ctx._source.auctiontype=$auctionType;
ctx._source.auctionstatus=$auctionStatus;
ctx._source.auctionprice=$auctionPrice;
ctx._source.auctionendtime='$auctionEndTime';
ctx._source.auctionadult=$adultListingFlag;", "upsert": { "auctionid":
$auctionID, "auctiontype": $auctionType, "auctionstatus":
$auctionStatus, "auctionprice": $auctionPrice, "auctionendtime":
"$auctionEndTime", "auctionadult": $adultListingFlag, "domaintype":
"auction", "fqdn": "$fqdn", "sld": "$sld", "tld": "$tld",
"vendorid": 6, "price": 0, "commissionrate": 0, "isfasttransfer":
false, "isadult": $aFlag, "istaboo": $tFlag, "sldlen": $sldlen,
"numhyphens": $numhyphens, "numdigits": $numdigits, "tokens": " .
(($tokens == null) ? '""' : json_encode($tokens)) . "}}"

Creates a document if it doesn't exist, updates it if it does.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4173f9b5-1d46-49a8-9647-c01618ee97e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nobody?

No ideas why bulk upserts slow down over time?

Loading 9 million documents starts off at 2000+ per second and, by hour
three, is down to 300 per second. The whole job takes the better part of 8
hours, with this linear slowdown.

Nobody has an idea? I'm drawing a blank, myself!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb09e1cc-c4a3-4484-ba21-128d60b68ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

FYI it is the weekend still for parts of the world, and we all enjoy our
time off :slight_smile:

How many nodes do you have? What is your heap size? Are you monitoring your
system and ES, if so what does it tell you? Have you tried increasing the
bulk count?

On 24 November 2014 at 16:48, Christopher Ambler const.dogberry@gmail.com
wrote:

Nobody?

No ideas why bulk upserts slow down over time?

Loading 9 million documents starts off at 2000+ per second and, by hour
three, is down to 300 per second. The whole job takes the better part of 8
hours, with this linear slowdown.

Nobody has an idea? I'm drawing a blank, myself!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb09e1cc-c4a3-4484-ba21-128d60b68ee7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb09e1cc-c4a3-4484-ba21-128d60b68ee7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZkwx34edHdaJhxB3MOad_XagWi%2Ba6HPfbv17%3DTBveDZgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

There are lot of possible reasons, just to name a few

  • client program errors
  • network issues
  • server issues (too few nodes, query load, tight resources)
  • improper settings, e.g. for fast segment merge
  • myriads of new fields
  • client does not evaluate batch response
  • etc. etc.

Even 2000+ are ridiculous slow for multithread and multiple nodes. From
your observation that it gets slow after a few minutes, I assume it has to
do with client program errors or improper settings for fast segment merge.

Jörg

On Mon, Nov 24, 2014 at 6:48 AM, Christopher Ambler <
const.dogberry@gmail.com> wrote:

Nobody?

No ideas why bulk upserts slow down over time?

Loading 9 million documents starts off at 2000+ per second and, by hour
three, is down to 300 per second. The whole job takes the better part of 8
hours, with this linear slowdown.

Nobody has an idea? I'm drawing a blank, myself!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb09e1cc-c4a3-4484-ba21-128d60b68ee7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb09e1cc-c4a3-4484-ba21-128d60b68ee7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHPqtfAnpPzuL8voY_3G8dqRt5SQYfjc9%2Beo4ms0we1zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Which version of ES?

This is probably not related to the slowdown, but when using scripts for
updating docs, it's best to keep the script constant, and use params for
the changing values (all the $vars in your PHP script). This means ES will
compile the script once and reuse that, vs paying compilation cost for
every update.

Do the node logs say anything about index throttling?

Maybe catch and post some hot threads once your'e down to 400 per second?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 20, 2014 at 4:08 PM, Christopher Ambler <
const.dogberry@gmail.com> wrote:

So this has me perplexed.

I have a bulk data loading job that creates an upsert statement and
batches 500 of them in a bulk operation using the _bulk interface.

I send the bulk insert via HTTP (on 9200) and wait for the response before
sending the next one, which I do immediately.

I do not hit any thread pool limits.

I have replicas set to zero and refresh interval set to -1 to make the
loading as lightweight as possible.

Timing these, they start out pretty fast and run about 2000 documents per
second. Four or so HTTP round trips.

This lasts for a few minutes and then it starts to slow. Within an hour,
it's running about 1200 per second. In another hour, it's down to about 600
per second. Then it seems to flatten-out about 400 per second until the job
is done, some 8 million documents later.

So my question is - why the slowdown? It's very consistent, seems
reasonably linear, and happens 100% of the time.

Any clues?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a787d461-f467-4f79-943b-e65e12492783%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a787d461-f467-4f79-943b-e65e12492783%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcxAuykzqyBe%2BszoouFMerFzE5mJ%2BJnbdASk-zCwk_n7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Refactoring my statement from script to a straight update { doc,
upsert_as_doc } seems to have done the trick. So rather than diagnose
what's odd about the script, this has resolved my issue. Yeah, lazy
solution, but a more optimal one :wink:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/756a3c30-5567-4ba8-9056-a775d55d4fa0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.