Need to test a index with terabytes of data, how can I do

I need to test performance of appand to a index with terabytes of data, and the index mapping is customed.
Is there a solution to this scenario?

Refer to this topic: Increase data size in Rally existing tracks

Whether to support looping to write a fixed index?

I tried, but got error.

Eventually I did this by writing data to large files.
However, it is too slow to compress large files.
Is there a big difference between source-file performance measured using the original file documents.json and the compressed file documents.json.bz2?

Have a look at the rally-eventdata-track. Unlike other tracks this does not rely on data in files but does instead generate data at runtime based on a set of probability distributions. This makes it possible to generate very large amounts of data with just track configuration. You can use this as is and just create a modified config or use this as a base for generating your own track handling your particular type of data.

This blog post describes how it was used to generate 4 TB of indexed data for a set of storage benchmarks. This video also discusses this track and its use.

This seems to be an extension based on the event-data log type, but my scenario requires testing with our actual business logs.Can rally-EventData-Track extend the data volume with custom index mapping?

I suspect you may need to create a new custom track.

Yes,does rally-EventData-Track support according custom track to generate terabytes of data?
by the way, doce source-file support config more than one file? I think this will be convenient to Increase and decrease doc amount according to different requirements

If the event format created by the rally-eventdata-track can be used it is relatively easy to create a new challenge that can generate a very large amount of data. You may also be able to alter the mappings used if necessary. If you need a specific event format and mappings you probably customize the track or generate files.

I am still not sure I fully understand what you are looking to test. Could you please elaborate on what you are looking to test and achieve? Are you looking to index into a set of time-based indices and see hoe the cluster performs with large amounts of data or are you going to index into a single index that will grow very large? What is your use case?

We want to test sending 2T of specific data to an index to see how large the cluster can reach.Now I have solved this problem, but I had a puzzle during the testing.

first question: Does the final result of throughput include replicas? I think the result was only primary shard and it seems according to samples to culculate: sum(bulk_size)/time_period. Does this right?
I have read How Write throughput is calculated in Rally - #2 by dliappis

second question: When the index_append start, i use iostat to monitor the io, sometimes the readbyte is zero, why? the bulk_indexing_client_num is 64.

third question: How can I know rally has start the amount of clients that I set?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.