Cannot Increase Write TPS in Elasticsearch by adding more nodes


(pranav amin) #1

Hi all,

While doing some prototyping in ES using SSD's we got some good Write TPS.
But the Write TPS saturated after adding some more nodes!

Here are the details i used for prototyping -

Requirement: To read data as soon as possible since the read is followed by
write.
Version of ES:1.0.0
Document Size:144 KB
Use of SSD for Storage: Yes
Benchmarking Tool: Soap UI or Jmeter
VM: Ubuntu, 64 Bit OS
Total Nodes: 12
Total Shards: 60
Threads: 200
Replica: 2
Index Shards: 20
Total Index:1
Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap

Using the above setup we got Write TPS ~= 500.

We wanted to know by adding more node if we can increase our Write TPS. But
we couldn't.

  • By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by
    10 i.e. ~= 510.
  • Adding more Hardware like CPU, RAM and increasing Heap didn't help as
    well [8 CPU, 12 GB RAM, 5 GB Heap].

Can someone help out or point ideas what will be wrong? Conceptually ES
should scale in terms of Write & Read TPS by adding more nodes. However we
aren't able to get that.

Much appreciated if someone can point us in the right direction. Let me
know if more information is needed.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e34d7c7-d3da-40c7-8fca-16281494065b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

There are many reasons that may cause this, just to name a few

  • benchmarking tool setup ( do they show correct numbers?)
  • network bandwidth limits
  • cluster setup (e.g. complex mapping, high latency between nodes)
  • pattern of the data input
  • method of data input (bulk vs. index, HTTP vs. Java API)
  • concurrency of server/client connections
  • thread pooling resource limits
  • SSD setup (e.g. RAID 0)
  • resource limits due to hardware setup (host OS often put hard
    restrictions on VM guest OS resources)
  • etc.

First I recommend testing a single VM for basic performance, especially SSD
raw write throughput.

Then, testing document processing, without and with a single node ES, for
measuring source processing overhead on client side, to eliminate issues
cause by clients.

Then, indexing to single node, and repeating this by doubling nodes each
step, for measuring the performance gain.

Do you see suspicious measurements even on a single node?

Also noteworthy is that VMs are generally quite slowish, compared to bare
metal. Performance depends on VM setup of the host OS. But do not expect
miracles.

There are many ES monitoring tools/plugins that can assist you in getting
metrics.

Jörg

On Mon, Jun 9, 2014 at 5:40 PM, pranav amin parulpatel25@gmail.com wrote:

Hi all,

While doing some prototyping in ES using SSD's we got some good Write TPS.
But the Write TPS saturated after adding some more nodes!

Here are the details i used for prototyping -

Requirement: To read data as soon as possible since the read is followed
by write.
Version of ES:1.0.0
Document Size:144 KB
Use of SSD for Storage: Yes
Benchmarking Tool: Soap UI or Jmeter
VM: Ubuntu, 64 Bit OS
Total Nodes: 12
Total Shards: 60
Threads: 200
Replica: 2
Index Shards: 20
Total Index:1
Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap

Using the above setup we got Write TPS ~= 500.

We wanted to know by adding more node if we can increase our Write TPS.
But we couldn't.

  • By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by
    10 i.e. ~= 510.
  • Adding more Hardware like CPU, RAM and increasing Heap didn't help as
    well [8 CPU, 12 GB RAM, 5 GB Heap].

Can someone help out or point ideas what will be wrong? Conceptually ES
should scale in terms of Write & Read TPS by adding more nodes. However we
aren't able to get that.

Much appreciated if someone can point us in the right direction. Let me
know if more information is needed.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e34d7c7-d3da-40c7-8fca-16281494065b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1e34d7c7-d3da-40c7-8fca-16281494065b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHnCgkUE0ZG9eprGTPzipP_H2wd7yCRX_go0ukZfW_o6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(pranav amin) #3

Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks in
terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but am
not able to figure out if i'm hitting some limits.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

How do you try to figure out you're hitting limits? I have not enough
information to help.

Marvel, Elastic HQ, etc. are all very useful tools but should be combined
with OS-related monitoring to get an overall picture.

Jörg

On Mon, Jun 9, 2014 at 9:31 PM, pranav amin parulpatel25@gmail.com wrote:

Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks in
terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but am
not able to figure out if i'm hitting some limits.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #5

One thing you never mentioned was what version of Java you are on, which
can impact things as well.

To give you some idea, we had a 12 node cluster of VMs with 30GB heap and
were seeing 12000 TPS (incoming events), so what you are seeing is very low.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 06:49, joergprante@gmail.com joergprante@gmail.com wrote:

How do you try to figure out you're hitting limits? I have not enough
information to help.

Marvel, Elastic HQ, etc. are all very useful tools but should be combined
with OS-related monitoring to get an overall picture.

Jörg

On Mon, Jun 9, 2014 at 9:31 PM, pranav amin parulpatel25@gmail.com
wrote:

Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks
in terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but am
not able to figure out if i'm hitting some limits.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZDd%3D-1DZgRumjE%2BKfYeHFgFtddrd-J7bYVbL01XotM9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(pranav amin) #6

Thanks Mark.

We are using Java version - 1.7.0_25

What is your document size? I'm wondering if our document size i.e. 144 KB
is causing the low TPS.

Thanks
Pranav.

On Monday, June 9, 2014 6:29:19 PM UTC-4, Mark Walkom wrote:

One thing you never mentioned was what version of Java you are on, which
can impact things as well.

To give you some idea, we had a 12 node cluster of VMs with 30GB heap and
were seeing 12000 TPS (incoming events), so what you are seeing is very low.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 10 June 2014 06:49, joerg...@gmail.com <javascript:> <
joerg...@gmail.com <javascript:>> wrote:

How do you try to figure out you're hitting limits? I have not enough
information to help.

Marvel, Elastic HQ, etc. are all very useful tools but should be
combined with OS-related monitoring to get an overall picture.

Jörg

On Mon, Jun 9, 2014 at 9:31 PM, pranav amin <parulp...@gmail.com
<javascript:>> wrote:

Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks
in terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but
am not able to figure out if i'm hitting some limits.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da8197c8-9bbc-441f-9f23-4287e2b786ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #7

On bare metal I can process sustained 10-12 MB/sec on a single node. Maybe
you can measure throughput in bytes per second, this is easier to compare.

Jörg

On Tue, Jun 10, 2014 at 6:19 PM, pranav amin parulpatel25@gmail.com wrote:

Thanks Mark.

We are using Java version - 1.7.0_25

What is your document size? I'm wondering if our document size i.e. 144 KB
is causing the low TPS.

Thanks
Pranav.

On Monday, June 9, 2014 6:29:19 PM UTC-4, Mark Walkom wrote:

One thing you never mentioned was what version of Java you are on, which
can impact things as well.

To give you some idea, we had a 12 node cluster of VMs with 30GB heap and
were seeing 12000 TPS (incoming events), so what you are seeing is very low.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 10 June 2014 06:49, joerg...@gmail.com joerg...@gmail.com wrote:

How do you try to figure out you're hitting limits? I have not enough
information to help.

Marvel, Elastic HQ, etc. are all very useful tools but should be
combined with OS-related monitoring to get an overall picture.

Jörg

On Mon, Jun 9, 2014 at 9:31 PM, pranav amin parulp...@gmail.com wrote:

Thanks Jorg for your help.

Do you recommend any tool that can help me to point out the bottlenecks
in terms of I/O, Memory, Network, GC, etc?
I'm using some tools that are free (like Marvel, Elastic HQ, etc), but
am not able to figure out if i'm hitting some limits.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3a442880-fbe7-4f90-89da-c5d9759784d7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLv
UpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZtnbG5LHSk7mgLdgHS1sLvUpbhFNE%3DGmn0tOLTAcO8Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/da8197c8-9bbc-441f-9f23-4287e2b786ef%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/da8197c8-9bbc-441f-9f23-4287e2b786ef%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF4%3DRVJaYc4zfWUN3ZbYFuFMdz3ggV9NDqVRJn%2Bhce7zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #8

Following the suggestion from Jörg, I calculated a rough 72MB per second
(500 docs per second x 144KB/doc). Not too shabby!

One thing to do sustain a higher docs/second rate is to carefully craft a
mapping and:

  1. Disable the _all field. I rarely (never!) need this in a production
    application. My locked-down mappings always direct queries to specific
    fields and always get-by-id for the best overall performance whenever
    possible. And my recent ELK stack exploration shows me that the message
    field contains the entire source of each log entry, so there is no need for
    _all there either; just tell Kibana to search message instead of _all.

  2. Set "index" : "no" for the fields that you don't want to query or don't
    need to query.

You would be pleasantly surprised at how quickly a large document goes in
when you only index the subset of fields you actually wish to query, and
also disable the _all field.

Hope this helps!

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f1a9c25-6fae-41e0-a12c-b0ce1201f847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Greg Murnane) #9

I haven't seen it asked yet; what is feeding data into your elasticsearch?
Depending on what you're doing to get it there, a large document size could
easily bottleneck some feeding mechanisms. It's also noteable that some
"green" spinning disks top out in the realm of 72MB/s. It might be useful
to make sure that your feeding mechanism can handle more than 500 TPS.

--
The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/372f7ff6-9245-4bb6-ae87-0eacedbb724e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(pranav amin) #10

We used Jmeter for this test.

On Friday, June 13, 2014 10:13:02 AM UTC-4, Greg Murnane wrote:

I haven't seen it asked yet; what is feeding data into your elasticsearch?
Depending on what you're doing to get it there, a large document size could
easily bottleneck some feeding mechanisms. It's also noteable that some
"green" spinning disks top out in the realm of 72MB/s. It might be useful
to make sure that your feeding mechanism can handle more than 500 TPS.

The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78489304-73b9-42d7-a8c3-c1ceb58fe84a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Georgi Ivanov) #11

I don't know how you are doing the indexing .

Are you using bulk request or .. ? Bulk insert can greatly increase
indexing speed.

You can also check node client. It should have better indexing speed
because it will be 1 hop operation, compared to two hop with transport
client. (assuming Java AP here)

You can hit the limits of the bulk thread pool(can be increased). If you
are sending all indexing ops to one server only. One could try to hist all
master nodes on round-robin basis.

You can monitor IOPs in marvel (or iostat locally on the server) to see if
are not hitting IO limit.

On my ES cluster i reach 50k indexing ops per second.

On Monday, June 9, 2014 5:40:53 PM UTC+2, pranav amin wrote:

Hi all,

While doing some prototyping in ES using SSD's we got some good Write TPS.
But the Write TPS saturated after adding some more nodes!

Here are the details i used for prototyping -

Requirement: To read data as soon as possible since the read is followed
by write.
Version of ES:1.0.0
Document Size:144 KB
Use of SSD for Storage: Yes
Benchmarking Tool: Soap UI or Jmeter
VM: Ubuntu, 64 Bit OS
Total Nodes: 12
Total Shards: 60
Threads: 200
Replica: 2
Index Shards: 20
Total Index:1
Hardware configuration: 4 CPU, 6 GB RAM, 3 GB Heap

Using the above setup we got Write TPS ~= 500.

We wanted to know by adding more node if we can increase our Write TPS.
But we couldn't.

  • By adding 3 more nodes (i..e Total Nodes = 15) the TPS just increase by
    10 i.e. ~= 510.
  • Adding more Hardware like CPU, RAM and increasing Heap didn't help as
    well [8 CPU, 12 GB RAM, 5 GB Heap].

Can someone help out or point ideas what will be wrong? Conceptually ES
should scale in terms of Write & Read TPS by adding more nodes. However we
aren't able to get that.

Much appreciated if someone can point us in the right direction. Let me
know if more information is needed.

Thanks
Pranav.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3470cead-d70a-4dbc-af3c-4b47abce4d40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #12