Problem with bulk inserting json via python


(Mariska Willemsen) #1

I have ben trying to bulk insert a json file into elasticsearch via python (very new to elastic). I had to transform the data a little bit before I put it in elastic. In the end I write my file to a ndjson and try to bulk insert using the following code:
with open("/Users/mariska/Documents/jsontestje14.json") as json_file:
body=json_file.read()

helpers.bulk(es, actions=body, index='jsononfagun6', doc_type='kenteken')

Which yield the error:
Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes

I've tried numerous things to change the format of the file so that it will be accepted by elastic but no success. It currently looks like this (example cause the real file has many more lines):

{
"Kenteken": "WSFT54",
"Voertuigsoort": "Aanhangwagen",
"Merk": "GS",
"Handelsbenaming": "AC-2000 AC",
"Vervaldatum APK": "19/10/2018",
"Datum tenaamstelling": "19/09/2005",
"Bruto BPM": "nan",
"Inrichting": "open laadvloer",
"Aantal zitplaatsen": "nan",
"Eerste kleur": "N.v.t.",
"Tweede kleur": "N.v.t.",
"Aantal cilinders": "nan",
"Cilinderinhoud": "nan",
"Massa ledig voertuig": "5580.0",
"Toegestane maximum massa voertuig": "20000.0",
"Massa rijklaar": "nan",
"Maximum massa trekken ongeremd": "nan",
"Maximum trekken massa geremd": "nan",
"Retrofit roetfilter": "nan",
"Zuinigheidslabel": "nan",
"Datum eerste toelating": "19/09/2005",
"Datum eerste afgifte Nederland": "19/09/2005",
"Wacht op keuren": "Geen verstrekking in Open Data",
"Catalogusprijs": "nan",
"WAM verzekerd": "N.v.t.",
"Maximale constructiesnelheid (brom/snorfiets)": "nan"
}

Several of these all seperated by newlines. It seems to parse every individual letter of every string seperately, but I can't figure out the problem. Hopefully someone can help!


(Daniel Mitterdorfer) #2

Hi,

according to the docs, the actions parameter has to be an iterable. As you read the file now, body is of type str which in turn is an iterable of characters and this is the reason for the error that you get.

Try reading the file line by line with:

with open("/Users/mariska/Documents/jsontestje14.json") as json_file:
    body=json_file.readlines()

Then it should work fine.

Note that this means that you read the whole file at once into memory which may or may not be what you want. Alternatively you can also iterate over the lines of the file, collect it the lines into a list and send the bulks yourself without using the helper.

Daniel


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.