BulkIndexError using Elasticsearch in Python

brendanwalker · February 14, 2020, 4:46pm

I am writing a program to search through really large (>400mb) csv files provided by the government. Some of these files have over 1,000,000 rows. This is a local program that roughly 5 people will use in my company to help them do their job better. I am using Python (x64) and have tried the native CSV import and Pandas import. Both methods produce the same result. I can read from a small test file (csv) perfectly fine and input them into the ES Index. But when I attempt to input the the large CSV file I get BulkIndexErrors and nothing indexes. What would be the proper to index a large csv file into Elasticsearch using Python?

with open(editContract.get()) as f:
csv_data = csv.DictReader(f, dialect='excel')
for row in csv_data:
helpers.bulk(es, row, index="contract_search", doc_type='_doc')

raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('45 document(s) failed to index...'reason': 'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes'

rugenl · February 14, 2020, 5:05pm

Look at this issue, it looks like most of the time the problem was in the format of the data.

Also, you don't have to send bulk a row at a time, I build data in a loop:

es_out.append(dict(es_row))

Then send it all.

spf_index = bulk(client, es_out)

Of course, within a reasonable number of rows

brendanwalker · February 14, 2020, 5:30pm

Forgive me as I have little experience with Elasticsearch. However, is there a difference in a Python dictionary and the object you are suggesting by appending rows? Am I supposed to transform the data in some way? I don't fully understand what you are suggesting with the code provided.
Also, I looked at the link you provided. I don't see anything particularly helpful there. The code works perfectly fine for small csv files as it is. It does not work for large csv files with thousands-->millions of rows. I don't understand why.

system · March 13, 2020, 5:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to index large csv files using java bulk api Elasticsearch	7	1795	June 22, 2019
[solved] - BulkIndexError - Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes Elasticsearch	3	10782	June 21, 2019
Best way to Index and Map large csv files with Python into Elasticsearch Elasticsearch	2	1958	July 2, 2019
BulkIndexError while I try to ingest data from python Elasticsearch	2	183	December 14, 2022
Raise bulkindexerror Elasticsearch language-clients	1	604	January 9, 2022

BulkIndexError using Elasticsearch in Python

Related topics