I am writing a program to search through really large (>400mb) csv files provided by the government. Some of these files have over 1,000,000 rows. This is a local program that roughly 5 people will use in my company to help them do their job better. I am using Python (x64) and have tried the native CSV import and Pandas import. Both methods produce the same result. I can read from a small test file (csv) perfectly fine and input them into the ES Index. But when I attempt to input the the large CSV file I get BulkIndexErrors and nothing indexes. What would be the proper to index a large csv file into Elasticsearch using Python?
with open(editContract.get()) as f:
csv_data = csv.DictReader(f, dialect='excel')
for row in csv_data:
helpers.bulk(es, row, index="contract_search", doc_type='_doc')
raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('45 document(s) failed to index...'reason': 'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes'