I am writing dummy data file- fakeDate.json using a python script docGen.py as a data file for a custom rally track. On trying to execute the rally I get the following error.
[ERROR] Cannot race. ('Could not execute benchmark', UnicodeDecodeError('utf-8', b'BZh91AY&SY\x97\xc7e\xbe\x00\x14/_\x80\x10P\x07\x7f\xf0?\xff\xff\xf0\xbf\xef\xffj
a\xf6\xf9M\xb8\xc9:\x91\xa2\xa6\xde\xbcUU\x1f6-eR\xda\x9c\xbd\x04o\xeb^n~h\xeby\xfb0<<k\xa3da,\x8a\x93\xa9\x19x\xd6\xca+\x8dY\x05\xbd\xc4|\x91\xb1\x1f\x92?\xf8\xbb\x92)\xc2\x84\x84\xbe;-\xf0', 10, 11, 'invalid start byte'))
I understand that this has got to do something with the encoding of the extracted JSON file containing documents. I tried setting the environment variables such as LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8, however that does not seem to solve the problem.
docGen.py
from faker import Faker
import random
import string
import json
import io
fake = Faker()
def data(records):
for i in range(records):
yield(dict([("id", ''.join(random.choices(string.ascii_uppercase + string.digits, k=32))),
("sessionId", ''.join(random.choices(string.ascii_uppercase + string.digits, k=24)))]))
d = data(10)
with io.open('fakeData.json', 'w', encoding='utf-8') as f:
for record in d:
f.write(json.dumps(record, ensure_ascii=False))
f.write('\n')
print('Done')
Sample: fakeData.json
{"id": "MELU1V867SRTPBVHWKKFIGHEDGJV54DP", "sessionId": "AE5DBUIM0UEDETE78KGAFP2D"}
{"id": "YWHA80Q1J29CXFQX1A2BYSUO3OCOQMJR", "sessionId": "EGF4RH1T3ZG1ZI2V92ZDGTIW"}
{"id": "GVQYOZAL8VCSM5C6UV9QJNLZ9WPC3299", "sessionId": "D458X82TJIFEMO5KUDO2Y8DR"}
{"id": "4NK6QE0E2D4RJBQ61J0D2M5VD5OAAEOC", "sessionId": "6RFR40SIIRLWA1N9FCV2CFH5"}
Any pointers?