Batch bulk operation - wikidata large json


#1

I'm trying to indicize the wikidata entities dump - taken from here - creating a bulk json.

I have manipulate the initial json in this way:

{"index": {"_type": "page", "_id": 1}}
 {"type":"item","id":"Q26","labels":{"en-gb":{"language":"en-gb","value":"Northern Ireland"},"en":        {"language":"en","value":"Northern Ireland"},"it":{"language":"it","value":"Irlanda del Nord"},"fr":{"language":"fr","value":"Irlande du Nord"},"eo":{"language":"eo","value":"Nord-Irlando"},"pl":{"language":"pl","value":"Irlandia P\u00f3\u0142nocna"}, [..]`

and I have tried to indexed with this command:

`cat nuovowikidata.json | parallel --pipe -L 2 -N 2000 -j3 'curl -H "Content-Type: application/x-ndjson" -s http://localhost:9200/wikidata_entities/_bulk --data-binary @- > /dev/null'`

but the operation runs very slow. in 36h i have indicized only 216448 docs . How could I improve that speed?
Thanks in advance


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.