Max Indexing rate per single Elasticsearch server

I have a single ES node(5.4.3) for doing benchmarks. It is a 64 GB machine with having 32 G as Heap size. Others are default settings with mlocking kept as true

I am using the following settings for the benchmark tests
No of written indices 10
total number of documents 10 -
concurrent clients 10
No of-shards 1
Number-of-replicas 0
Bulk-size of 5000 with each document with max-fields of 10 and max-size-per-field as 50

I am getting just 30000 requests per second(around 7MB/sec). Indexing rate remains more or less same if I change above test settings.
I know that they are default settings. But, how do I know that I have reached max out of a single node and need to scale? How do I find some theoretical max or can someone share their maximum indexing rate that was achieved?

Do you have monitoring installed? What does CPU and disk I/O looking like during indexing? How large are your documents? What type of data do they contain? What do your mappings look like? How are you loading the data?

1 Like

What does CPU and disk I/O looking like during indexing?
CPU and IO are around 20-30% . Below is the iostat output

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.30    0.00    2.03    0.84    0.00   82.83

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   665.00    0.00  291.00     0.00 11808.00    81.15     1.01    3.46    0.00    3.46   1.03  30.00

How large are your documents?
There are only 10 unique documents in whole benchmark. Each document contains maximum of 10 fields with key of 10 characters and value of 50 characters. This means that a document has max of 600 characters. And documents are sent to ES in a bulk of 5000 documents continuously(each batch contains 5000 docs).

What type of data do they contain?
Each character is a alphabet

What do your mappings look like?
It is dynamic mapping. I haven't created any static mappings.

How are you loading the data?
I am using elasticsearch python module Elasticsearch().bulk() to push bulk requests

Here is the output after 30 second run

{

"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "cluster",
"timestamp" : 1498770136805,
"status" : "green",
"indices" : {
"count" : 10,
"shards" : {
"total" : 10,
"primaries" : 10,
"replication" : 0.0,
"index" : {
"shards" : {
"min" : 1,
"max" : 1,
"avg" : 1.0
},
"primaries" : {
"min" : 1,
"max" : 1,
"avg" : 1.0
},
"replication" : {
"min" : 0.0,
"max" : 0.0,
"avg" : 0.0
}
}
},
"docs" : {
"count" : 905000,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 100841187,
"throttle_time_in_millis" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 55,
"memory_in_bytes" : 1652155,
"terms_memory_in_bytes" : 1430943,
"stored_fields_memory_in_bytes" : 53352,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 154880,
"points_memory_in_bytes" : 0,
"doc_values_memory_in_bytes" : 12980,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},

In benchmarks I have done I am often able to saturate a node with around the same number of connections you are using, so I suspect it may be something with how you run the benchmark.

Are you allowing Elasticsearch to assign a document id or are you updating the same documents over and over? If you are in effect updating, how does throughput differ if you do not specify document ids in the bulk requests?

In order to get a baseline, I would recommend running a few of the default benchmarks available with rally.

@Christian_Dahlqvist Thanks for the reply. I am allowing elasticsearch to automatically assign ids. I am using mostly default settings. What custom settings do you use? It would be helpful

Can you also help me in finding the interesting parameters to be tuned for each of the following system resources

  1. Node CPU is underutilized
  2. Node disk io is underutilized
  3. Node memory is underutilized

I would recommend that you run Rally to get a comparison. A good track might be the default logging track. I have used Rally to saturate nodes and we know how it works and can therefore compare your results to benchmarks that we have run.

Is Elasticsearch installed on a bare-metal server in your environment or is it a VM?

Currently, It is a bare metal installation. Any particular reason? I will try rally and update. Thanks for the suggestion

@Christian_Dahlqvist Rally is showing throughput of 120k average for logging track. Where can I get the rally benchmark configurations(indices, shards etc) and how data is bulk sent to ES from rally benchmark? I will try to replicate the same.

One other thing I noticed was CPU usage. Rally results showed 500 % while iostat showed 50% of idle time.

|   Lap |                          Metric |    Operation |     Value |   Unit |
|------:|--------------------------------:|-------------:|----------:|-------:|
|   All |                   Indexing time |              |   389.108 |    min |
|   All |                      Merge time |              |   121.269 |    min |
|   All |                    Refresh time |              |   15.4617 |    min |
|   All |                      Flush time |              |    7.4838 |    min |
|   All |             Merge throttle time |              |   49.7276 |    min |
|   All |                Median CPU usage |              |     500.4 |      % |
|   All |              Total Young Gen GC |              |   233.184 |      s |
|   All |                Total Old Gen GC |              |     14.96 |      s |
|   All |                      Index size |              |   19.2082 |     GB |
|   All |                 Totally written |              |   182.407 |     GB |
|   All |          Heap used for segments |              |   71.0596 |     MB |
|   All |        Heap used for doc values |              |  0.134235 |     MB |
|   All |             Heap used for terms |              |   58.6108 |     MB |
|   All |             Heap used for norms |              | 0.0319214 |     MB |
|   All |            Heap used for points |              |   4.62425 |     MB |
|   All |     Heap used for stored fields |              |   7.65836 |     MB |
|   All |                   Segment count |              |       523 |        |
|   All |                  Min Throughput | index-append |    111915 | docs/s |
|   All |               Median Throughput | index-append |    116063 | docs/s |
|   All |                  Max Throughput | index-append |    125471 | docs/s |
|   All |         50th percentile latency | index-append |    286.41 |     ms |
|   All |         90th percentile latency | index-append |   568.223 |     ms |
|   All |         99th percentile latency | index-append |   1352.39 |     ms |
|   All |       99.9th percentile latency | index-append |   2526.76 |     ms |
|   All |      99.99th percentile latency | index-append |   3299.07 |     ms |
|   All |        100th percentile latency | index-append |   3329.02 |     ms |
|   All |    50th percentile service time | index-append |    286.41 |     ms |
|   All |    90th percentile service time | index-append |   568.223 |     ms |
|   All |    99th percentile service time | index-append |   1352.39 |     ms |
|   All |  99.9th percentile service time | index-append |   2526.76 |     ms |
|   All | 99.99th percentile service time | index-append |   3299.07 |     ms |
|   All |   100th percentile service time | index-append |   3329.02 |     ms |

Thanks for your help.

1 Like

If you have a very powerful server you may need to tweak the default settings in order to fully saturate the node, e.g. by increasing the level of concurrency. The operations and settings are defined in the rally-tracks repository.

@Christian_Dahlqvist Thanks for your help. I am constantly seeing that CPUs are less than 50 % utilized. Which settings should i look for in order to change the concurrency? Do you mean threadpool settings?

I see that default threadpool type for bulk and index is fixed and min/max is very high(32). I increased queue size of index and bulk to 100000. Any other settings that can effectively use more CPUs for faster ingest?

I was referring to the number of concurrent connections that rally used, which is specified in the track. I see no evidence that you need to change anything in the Elasticsearch config at this point.

Go to the host where Rally is running and go to ~/.rally/benchmarks/tracks/default/logging/challenges and edit the default.json file. Here you can increase the number of clients Rally will use for indexing. I would recommend to start by doubling it and see what difference that makes.

When comparing the indexing rates achieved in this Rally benchmark to what you get with your data, note that the difference in documents size will have a significant impact. This set of tests should however at least show you how to go about saturating your server and it should not be too hard to create a custom track for Rally that uses your data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.