Discuss the Elastic Stack

Elasticsearch parallel bulk using Python - issue with json

Elastic Stack Elasticsearch

hari_priya (Hari Priya) August 25, 2018, 3:18pm 1

Hi All -

I am a newbie with ELasticsearch and I am encountering strange issue .
Specifications: I have a Json file of size: 0.5 GB
and I am using python 3.6 , ELasticsearch 6.3 version .
I am using parallel bulk call .

code:

try:
deque(helpers.parallel_bulk(es,read_json(filename),request_timeout=60,raise_on_error=True,raise_on_exception=True), maxlen=0)
except TransportError as e:
print(next(read_json(filename)))

issue#1: I am encountering message saying :
POST https://XXXXXXXXXXXXXXXXXX/_bulk [status:413 request:192.868s]

And encountering an exception and the job is failing /missing inserting some data .

How I can handle this one programatically?
How can i specify to print out/redirect the records that are getting dropped programatically?

issue#2:
when I am using another bigger Json file which is of 2 GB size ( which is larger compared to the previous one but exact same format) , it is not throwing any exceptions and inserting everything .

Am i missing something here ? Not sure what is the issue .

Any thoughts for me . I really appreciate all your time and help .

jaddison (James Addison) September 10, 2018, 3:50am 2

So, the format of the two JSON files may be the same, but the content obviously isn't. Elasticsearch is choking on document data from the smaller file because the data is "too large" (<--- status code 413 is REQUEST_ENTITY_TOO_LARGE).

To find out which record it is, you might be able to just do a quick look through the file to find the longest line(s), perhaps? (I'm not sure how they're stored, but I assume a doc per line).

Otherwise, you could just avoid the convenience of parallel_bulk(...) and code it up yourself, thereby finding which line blows it up.

Or, you could set a debug breakpoint in the Python code where the exception is caught to see the doc.

Or, you might be able to modify the Elasticsearch configuration to allow larger document payloads via REST.

Hope this helps.

system (system) Closed October 8, 2018, 3:50am 3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views	Activity
How to use parallel_bulk function Elasticsearch	7	11167	June 17, 2019
Problem with bulk inserting json via python Elasticsearch	2	4348	August 10, 2018
Import large number of json documents failing via bulk import with no error message Elasticsearch	4	3583	November 25, 2018
Not able to get rid of circuit breaking exception Elasticsearch	10	2012	September 27, 2019
How can I send large JSON file (6 GB) to Elasticsearch using bulk API? Elasticsearch	5	11849	May 18, 2020

© 2020. All Rights Reserved - Elasticsearch

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries
Trademarks
Terms
Privacy
Brand
Code of Conduct

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.