I'm trying to indicize the wikidata entities dump - taken from here - creating a bulk json.
I have manipulate the initial json in this way:
{"index": {"_type": "page", "_id": 1}}
{"type":"item","id":"Q26","labels":{"en-gb":{"language":"en-gb","value":"Northern Ireland"},"en": {"language":"en","value":"Northern Ireland"},"it":{"language":"it","value":"Irlanda del Nord"},"fr":{"language":"fr","value":"Irlande du Nord"},"eo":{"language":"eo","value":"Nord-Irlando"},"pl":{"language":"pl","value":"Irlandia P\u00f3\u0142nocna"}, [..]`
and I have tried to indexed with this command:
`cat nuovowikidata.json | parallel --pipe -L 2 -N 2000 -j3 'curl -H "Content-Type: application/x-ndjson" -s http://localhost:9200/wikidata_entities/_bulk --data-binary @- > /dev/null'`
but the operation runs very slow. in 36h i have indicized only 216448 docs . How could I improve that speed?
Thanks in advance