Elasticsearch index throughtput


(Jayabal K) #1

We have ES-Cluster setup as following, but we cant index not more than 1000 documents per second. We are trying verifying performance and scalablity.

Client node (2 cpu,4GB memory) - 1
Data node (2 cpu,4GB memory) - 10
master (2 cpu,4GB memory) - 1

Expectation
Trying to index 100k documents per second. Please let us know best practice

Thanks in advance


(David Turner) #2

What are you using for your benchmarks? We recommend Rally. Benchmarking is a tricky thing to get right and it's very easy to introduce errors or to measure something different from what you think you're measuring.

Are you indexing via the single client node? If you add a second client node do you get higher throughput? If so, it looks like you're measuring the throughput of a single client node rather than the capacity of the whole cluster.


(Jayabal K) #3

We are using our own simulator to simulate the documents.
Yes, we are using single coordinator node(client). Also tried with 2 coordinator node, but no luck.

Is it possible to index 100k documents per second? Do we need to change any cluster configuration?

What are the best practices to setup ES-cluster?


(David Turner) #4

Yes. Our Rally-based benchmarks regularly exceed this indexing rate by some margin, on a 3-node cluster.


(Christian Dahlqvist) #5

What is the size and structure of your documents? Have you identified what is limiting throughput, e.g. CPU and/or disk I/O?


(Jayabal K) #6

could you share your 3 node cluster details?. It will be useful to analyse whats is blocking in my setup.


(Jayabal K) #7

My documents size is around 1kb which contains 20 attributes(key-value pair) in it.
I gone though the kibana monitoring it shows low CPU and disk utilization.
I don't know what was blocking.Please guide me how to analyse it.


(David Turner) #8

Sure, the benchmark website has an overview:

Current environment

All benchmarks are run on bare-metal machines with the following specifications:

  • CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  • RAM: 32 GB
  • SSD:
  • OS: Linux kernel version 4.13.0-38
  • OS TUNING:
    • /sys/kernel/mm/transparent_hugepage/enabled = always
    • /sys/kernel/mm/transparent_hugepage/defrag = always
  • JVM: Oracle JDK 1.8.0_131-b11

If it's not the client node, the next thing I'd suspect is the test harness itself. You're trying to push about 100MB/s of data at the cluster, and it's possible that the test harness you've written just isn't fast enough to do this.

If the bottleneck is Elasticsearch then I'd expect to see tasks building up on the client node. You can see how busy the client node is by looking at things like GET /_nodes/CLIENT-NODE-NAME/stats/thread_pool and GET /_tasks?nodes=CLIENT-NODE-NAME. If the client node looks quiet then the bottleneck is outside the system.


(Jayabal K) #9

I able to achieve 100K rps using rally (benchmark tool) via bulk API. But using more numbers of single document index request (in parallel) index throughput could not go beyond 1000 rps.

Is bullk API is the only way to achieve 100K rps ? Or something else need to be focused for single document API?

Please advice.


(David Turner) #10

Yes, the bulk API is the correct way to achieve high indexing throughput.


(Jayabal K) #11

For our real time scenario, there could more possible to get more number single doc index request.

Is there any way to get good rps using single indexing API?


(Christian Dahlqvist) #12

Indexing a single document at a time results in a lot more overhead per document in terms of request processing and syncing to disk and is therefore always going to be considerably slower than using bulk requests.


(David Turner) #13

To get the best throughput (and lower latency too) it's normally best to collect your requests into reasonable-sized batches before indexing them. You say you have a realtime scenario: what exactly are your latency targets?


(Jayabal K) #14

I able to get 50K rps using bulk api(size : 5000) and index mapping(not using dynamic mapping).

I have one question regarding the mapping.

Could you please explain what is field mapping and difference between (type : text and filed : { type : keyword})?


(David Turner) #15

Perhaps this helps: https://www.elastic.co/blog/strings-are-dead-long-live-strings

If you have more questions about mapping then I suggest you open a new thread rather than continuing here.