How to use parallel_bulk function

MKU · May 20, 2019, 6:15am

I have a document to be indexed to elastic search , it is 200 MB file ,so i want to use parallel bulk..
file is in this format.
[{},{},{},{}]
basically it is an array of objects.

but when i try to index using parallel bulk, nothing is being indexed to elastic search.
how do i index data using parallel bulk?
Do i need to format the data in any format before i use parallel bulk, if yes please specify the format.

Christian_Dahlqvist · May 20, 2019, 6:20am

The bulk interface is for sending multiple documents (not necessarily all) in a single request and has to follow the format described in the documentation. It is recommended that bulk requests are limited to around 5MB in size so you should break your data up into multiple requests.

MKU · May 20, 2019, 6:23am

Using this format bulk api works but not parallel_bulk or streaming_bulk, basically i want to bulk index data in chunks of desired size.

Christian_Dahlqvist · May 20, 2019, 6:25am

What do you mean with this? How are you indexing the data?

MKU · May 20, 2019, 6:26am

please check out parallel and streaming bulk section here https://elasticsearch-py.readthedocs.io/en/master/helpers.html

Christian_Dahlqvist · May 20, 2019, 6:28am

It would have helped if you explained that you are using the python client in the forst post. Can you show us your code? Are you parsing the input file and treating the objects one by one in the code?

MKU · May 20, 2019, 7:46am

import json

from elasticsearch import Elasticsearch

from elasticsearch.helpers import bulk

from elasticsearch.helpers import parallel_bulk

from elasticsearch.helpers import streaming_bulk

from requests.auth import AuthBase

import requests

import json

requests.packages.urllib3.disable_warnings()

es = Elasticsearch('https://localhost:9200',ca_certs=False,verify_certs=False,headers={'content-type': 'application/x-ndjson','Authorization' :'Bearer P/N1UniTZQ~'})

parallel_bulk(client=es,actions="ldif2.json",index="user_volvo",chunk_size=2)

This is the python code that i used.

the data ldif2.json is in this format.

[{"_index": "abc", "_type": "user", "_source": {"dn": " cn=abc,o=VCC", "changetype": " add", "mail": " abc@corp.com", "surname": " satya2"}} ,{},{},{}.......]

please refer this stackoverflow link,
https://stackoverflow.com/questions/54962685/how-to-improve-parallel-bulk-from-python-code-for-elastic-insert
https://stackoverflow.com/questions/54212958/elasticsearch-python-parallel-bulk-can-not-insert-data

system · June 17, 2019, 7:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Helpers.parallel_bulk in Python not working? Elasticsearch	6	19577	July 5, 2017
Elasticsearch parallel bulk using Python - issue with json Elasticsearch	2	879	October 8, 2018
Index from pandas to Elastic Search Using BULK and Parallel BULK Elasticsearch	3	3178	May 10, 2019
Elasticsearch Python Lib Elasticsearch	1	167	August 29, 2023
Parallel_bulk record format (update, insert, etc.) Elasticsearch	2	877	July 9, 2017

How to use parallel_bulk function

Related topics