Create simulated data for ingestion using Rally

I would like to be able to perform the following:

  1. Have a sample templated document for ingestion
  2. Have a custom class that randomizes data to be inserted into the templated document
  3. Be able to specify the number of documents to be generated and ingested into elasticsearch
  4. Not use static files for ingestion

Any tips/hints/help on being able to perform the above?

Hi @lance.zukel

Did you look at this

Or this

1 Like

Was hoping to utilize rally to perform this.
We are looking to benchmark ingestion, keyword vs text types effects on different fields, and generate enough data into our dev cluster to more accurately simulate the data in our prod cluster.

@lance.zukel Oh Welcome to the community... If you specifically want to use Rally I would open a Topic with Rally in the Subject line (or edit this one), there are folks that know about Rally (not me :slight_smile: )

Your current subject wont bring the Rally folks :slight_smile:

I guess I figured the rally tag would suffice, I have modified the title of my post... Thank you

1 Like

Didn't even notice it.. :slight_smile: Perhaps someone will... Good Subject Lines get noticed... just my experience.

Oh as a Rally developer I noticed (the tag is indeed enough) but we don't have any tool to generate data in Rally itself, Rally only works with data that is already generated.

Can you write a script to generate the data you need?

1 Like

This would be a great feature to have in rally.
I think it would be very helpful for users to be able to specify a document template, some array of random/pseudo random values, and the number of documents to be generated.
This would allow simulation/benchmarking of documents/fields that are specific to a particular deployment, thus better replicating actual workloads.

hello @lance.zukel

you can take a look at GitHub - elastic/elastic-integration-corpus-generator-tool: Command line tool used for generating events corpus dynamically given a specific integration

it's limited to integration packages and I'm not sure it is your use case

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.