Create simulated data for ingestion using Rally

lance.zukel · June 28, 2022, 7:11pm

I would like to be able to perform the following:

Have a sample templated document for ingestion
Have a custom class that randomizes data to be inserted into the templated document
Be able to specify the number of documents to be generated and ingested into elasticsearch
Not use static files for ingestion

Any tips/hints/help on being able to perform the above?

stephenb · June 28, 2022, 10:53pm

Hi @lance.zukel

Did you look at this

Or this

lance.zukel · June 29, 2022, 1:58pm

Was hoping to utilize rally to perform this.
We are looking to benchmark ingestion, keyword vs text types effects on different fields, and generate enough data into our dev cluster to more accurately simulate the data in our prod cluster.

stephenb · June 29, 2022, 2:38pm

@lance.zukel Oh Welcome to the community... If you specifically want to use Rally I would open a Topic with Rally in the Subject line (or edit this one), there are folks that know about Rally (not me )

Your current subject wont bring the Rally folks

lance.zukel · June 29, 2022, 2:49pm

I guess I figured the rally tag would suffice, I have modified the title of my post... Thank you

stephenb · June 29, 2022, 2:53pm

Didn't even notice it.. Perhaps someone will... Good Subject Lines get noticed... just my experience.

Quentin_Pradet · June 30, 2022, 12:50pm

Oh as a Rally developer I noticed (the tag is indeed enough) but we don't have any tool to generate data in Rally itself, Rally only works with data that is already generated.

Can you write a script to generate the data you need?

lance.zukel · June 30, 2022, 2:54pm

This would be a great feature to have in rally.
I think it would be very helpful for users to be able to specify a document template, some array of random/pseudo random values, and the number of documents to be generated.
This would allow simulation/benchmarking of documents/fields that are specific to a particular deployment, thus better replicating actual workloads.

Andrea_Spacca · July 7, 2022, 3:14am

hello @lance.zukel

you can take a look at GitHub - elastic/elastic-integration-corpus-generator-tool: Command line tool used for generating events corpus dynamically given a specific integration

it's limited to integration packages and I'm not sure it is your use case

system · August 4, 2022, 3:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data generation for Elasticsearch Elasticsearch	8	7536	July 5, 2017
How to create and use data for indexing using Rally Elasticsearch rally	5	929	April 8, 2020
Questions about custom tracks Elasticsearch rally	4	761	March 7, 2018
Esrally setup for a day with custom logs & ingest-pipeline for benchmarking the cluster Elasticsearch rally	3	77	July 28, 2025
Split source-file into many indicies Elasticsearch rally	7	977	May 16, 2019

Create simulated data for ingestion using Rally

Related topics