Hi Vikash,
In the standard tracks, Rally does not pull real-time data from anywhere. Rally load generators index documents retrieved from a local JSON "corpus", whose location is configured in the track JSON file (usually fetched from S3 bucket).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.