Batch bulk operation - wikidata large json

geppo · July 18, 2018, 11:27am

I'm trying to indicize the wikidata entities dump - taken from here - creating a bulk json.

I have manipulate the initial json in this way:

{"index": {"_type": "page", "_id": 1}}
 {"type":"item","id":"Q26","labels":{"en-gb":{"language":"en-gb","value":"Northern Ireland"},"en":        {"language":"en","value":"Northern Ireland"},"it":{"language":"it","value":"Irlanda del Nord"},"fr":{"language":"fr","value":"Irlande du Nord"},"eo":{"language":"eo","value":"Nord-Irlando"},"pl":{"language":"pl","value":"Irlandia P\u00f3\u0142nocna"}, [..]`

and I have tried to indexed with this command:

`cat nuovowikidata.json | parallel --pipe -L 2 -N 2000 -j3 'curl -H "Content-Type: application/x-ndjson" -s http://localhost:9200/wikidata_entities/_bulk --data-binary @- > /dev/null'`

but the operation runs very slow. in 36h i have indicized only 216448 docs . How could I improve that speed?
Thanks in advance

system · August 15, 2018, 11:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk API doesn't do anything but passes command Elasticsearch	2	458	March 5, 2018
Post bulk Json -Error Elasticsearch	3	5683	August 10, 2017
Need help with this curl command for bulk uploading Elasticsearch	3	4103	July 16, 2019
Problem with Bulk Indexing Elasticsearch	4	4800	July 6, 2017
Index a large dataset into elasticsearch Elasticsearch	3	1073	July 5, 2017

Batch bulk operation - wikidata large json

Related topics