Dec 7th, 2020: [EN] Using Rally as a data import/export tool

French version

This post will show you how you can use Rally to export data from one cluster to another.

The idea is to extract the whole data from one or more indices in a (big) flat file and reuse it later to easily import in another cluster.
Rally helps doing that very easily.

Steps to follow:

Installing Rally

Install Rally by following the official documentation. This is optional if you are using Rally with Docker.

You can test it's correctly working by running:

╰─$ esrally --version

Exporting data

Data export is made by writing a scenario with a command like:

╰─$ esrally create-track \
            --track=mytrack \
            --target-hosts=URL.elastic-cloud.com:9243 \
            --client-options="timeout:60,use_ssl:true,verify_certs:true,basic_auth_user:'USER',basic_auth_password:'PASSWORD'" \ 
            --indices="kibana_sample_data_logs,kibana_sample_data_ecommerce,kibana_sample_data_flights" \ 
            --output-path=~/Documents/Consulting/sandbox/rally

With docker, this command would be:

╰─$ docker run -v ~/Documents/Consulting/sandbox/rally:/mnt \
           elastic/rally create-track \
           --track=mytrack-docker \
           --target-hosts=URL.elastic-cloud.com:9243 \
           --client-options="timeout:60,use_ssl:true,verify_certs:true,basic_auth_user:'USER',basic_auth_password:'PASSWORD'" \ 
           --indices="kibana_sample_data_logs,kibana_sample_data_ecommerce,kibana_sample_data_flights" \ 
           --output-path=/mnt

The following options are used:

  • --track: name your scenario
  • --target-hosts: where Rally should export data from (multiple servers are allowed)
  • --indices: which indices you would like to export
  • --output-path: where to export the data and the configuration
  • --client-options: client parameters, like authentication

Once done, you can check what has been generated by running the following command:

╰─$ esrally info --track-path=~/Documents/Consulting/sandbox/rally/mytrack

This will give you an idea of the whole size of the generated scenario:

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
Showing details for track [mytrack]:
* Description: Tracker-generated track for mytrack
* Documents: 31,808
* Compressed Size: 2.1 MB
* Uncompressed Size: 30.1 MB
Schedule:
----------
1. delete-index
2. create-index
3. cluster-health
4. bulk (8 clients)
-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

You can compress all the files within one single archive and ship it to the infrastructure where your destination cluster is running.

Importing data

In the following example, we are using a local development cluster. No need then for the --client-options parameter because our local cluster is not secured.

╰─$ esrally --track-path=~/Documents/Consulting/sandbox/rally/mytrack 
            --target-hosts=127.0.0.1:9200 
            --pipeline=benchmark-only

Once done, Rally will give some statistics about your cluster (ingestion rate for example):

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
[INFO] Preparing file offset table for [/Users/laurent/Documents/Consulting/sandbox/rally/mytrack/kibana_sample_data_logs-documents.json] ... [OK]
[INFO] Preparing file offset table for [/Users/laurent/Documents/Consulting/sandbox/rally/mytrack/kibana_sample_data_ecommerce-documents.json] ... [OK]
[INFO] Preparing file offset table for [/Users/laurent/Documents/Consulting/sandbox/rally/mytrack/kibana_sample_data_flights-documents.json] ... [OK]
[INFO] Racing on track [mytrack] and car ['external'] with version [7.9.2].
[WARNING] indexing_total_time is 10 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]
------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
|                                                         Metric |   Task |       Value |   Unit |
|---------------------------------------------------------------:|-------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |        |       0.397 |    min |
|             Min cumulative indexing time across primary shards |        | 0.000166667 |    min |
|          Median cumulative indexing time across primary shards |        |   0.0681333 |    min |
|             Max cumulative indexing time across primary shards |        |    0.247217 |    min |
|            Cumulative indexing throttle time of primary shards |        |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |        |           0 |    min |
| Median cumulative indexing throttle time across primary shards |        |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |        |           0 |    min |
|                        Cumulative merge time of primary shards |        |     0.01685 |    min |
|                       Cumulative merge count of primary shards |        |           1 |        |
|                Min cumulative merge time across primary shards |        |           0 |    min |
|             Median cumulative merge time across primary shards |        |           0 |    min |
|                Max cumulative merge time across primary shards |        |     0.01685 |    min |
|               Cumulative merge throttle time of primary shards |        |           0 |    min |
|       Min cumulative merge throttle time across primary shards |        |           0 |    min |
|    Median cumulative merge throttle time across primary shards |        |           0 |    min |
|       Max cumulative merge throttle time across primary shards |        |           0 |    min |
|                      Cumulative refresh time of primary shards |        |      0.0601 |    min |
|                     Cumulative refresh count of primary shards |        |          23 |        |
|              Min cumulative refresh time across primary shards |        | 0.000883333 |    min |
|           Median cumulative refresh time across primary shards |        |  0.00673333 |    min |
|              Max cumulative refresh time across primary shards |        |   0.0337333 |    min |
|                        Cumulative flush time of primary shards |        |           0 |    min |
|                       Cumulative flush count of primary shards |        |           0 |        |
|                Min cumulative flush time across primary shards |        |           0 |    min |
|             Median cumulative flush time across primary shards |        |           0 |    min |
|                Max cumulative flush time across primary shards |        |           0 |    min |
|                                        Total Young Gen GC time |        |       0.062 |      s |
|                                       Total Young Gen GC count |        |           5 |        |
|                                          Total Old Gen GC time |        |           0 |      s |
|                                         Total Old Gen GC count |        |           0 |        |
|                                                     Store size |        | 3.27006e-05 |     GB |
|                                                  Translog size |        |   0.0315683 |     GB |
|                                         Heap used for segments |        |    0.365074 |     MB |
|                                       Heap used for doc values |        |    0.111397 |     MB |
|                                            Heap used for terms |        |    0.219269 |     MB |
|                                            Heap used for norms |        |   0.0147095 |     MB |
|                                           Heap used for points |        |           0 |     MB |
|                                    Heap used for stored fields |        |   0.0196991 |     MB |
|                                                  Segment count |        |          40 |        |
|                                                 Min Throughput |   bulk |     1100.03 | docs/s |
|                                              Median Throughput |   bulk |     2616.48 | docs/s |
|                                                 Max Throughput |   bulk |     6598.72 | docs/s |
|                                        50th percentile latency |   bulk |     727.761 |     ms |
|                                        90th percentile latency |   bulk |     2123.67 |     ms |
|                                       100th percentile latency |   bulk |     2156.42 |     ms |
|                                   50th percentile service time |   bulk |     727.761 |     ms |
|                                   90th percentile service time |   bulk |     2123.67 |     ms |
|                                  100th percentile service time |   bulk |     2156.42 |     ms |
|                                                     error rate |   bulk |           0 |      % |
--------------------------------
[INFO] SUCCESS (took 56 seconds)
--------------------------------

Et voilà! Data has been imported in your brand new cluster. :slight_smile:

5 Likes

Laurent - Thanks for this post, I used it several time these past couple of weeks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.