Bulk insert vs Single insert

Hi All,

The primary dev managing our ES cluster has made the statement that single
document writes to ES will only provide us with roughly 30 / 40 writes a
second. Whereas the bulk operations will give us more in the range of a
1,000+. I realize that bulk is always faster (or is generally) and there
are hardware / environment constraints to any process. However, with other
technologies you do not pay such a heavy price for single insertions. I am
obviously ignorant when it comes to ES, but why do you pay such a heavy
price for document writes in ES? Or are we just not properly informed?

Environment:

  • Apache Storm writes to our ES cluster
  • Currently all of the writes are processed in bulk operations.

ES Configuration:

  • 11 data nodes

    2x AMD Opteron(TM) Processor 6272 (16 cores @ 2.1/3.0 GHz, 16 MB L3
    cache)

    • 256 GB RAM
    • 12 TB (7200 RPM platter disks in LVM ext4 configuration)
  • ES configuration

    • two instances per node (16 cores per instance)
    • 30 GB RAM lock-in per instance (max recommended by ES)
    • 18 shards per index (empirically best combo of RAM vs. shard
      trade-off)

Any information / suggestions would be greatly appreciated.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8ad4c98d-34ca-4205-b763-88e1392cf57c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

From what I understand (which may not be 100% right), most of the overhead
is with generating and dealing with the HTTP request as it's a heavy
operation.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 October 2014 03:35, mike.giardinelli@gmail.com wrote:

Hi All,

The primary dev managing our ES cluster has made the statement that single
document writes to ES will only provide us with roughly 30 / 40 writes a
second. Whereas the bulk operations will give us more in the range of a
1,000+. I realize that bulk is always faster (or is generally) and there
are hardware / environment constraints to any process. However, with other
technologies you do not pay such a heavy price for single insertions. I am
obviously ignorant when it comes to ES, but why do you pay such a heavy
price for document writes in ES? Or are we just not properly informed?

Environment:

  • Apache Storm writes to our ES cluster
  • Currently all of the writes are processed in bulk operations.

ES Configuration:

  • 11 data nodes

    2x AMD Opteron(TM) Processor 6272 (16 cores @ 2.1/3.0 GHz, 16 MB L3
    cache)

    • 256 GB RAM
    • 12 TB (7200 RPM platter disks in LVM ext4 configuration)
  • ES configuration

    • two instances per node (16 cores per instance)
    • 30 GB RAM lock-in per instance (max recommended by ES)
    • 18 shards per index (empirically best combo of RAM vs. shard
      trade-off)

Any information / suggestions would be greatly appreciated.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ad4c98d-34ca-4205-b763-88e1392cf57c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8ad4c98d-34ca-4205-b763-88e1392cf57c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bveakO%2B_-SCt7LLJJQeCeQ9zuiJzmpa6pm1azAga%2BP8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.