How to use parallel_bulk function

I have a document to be indexed to elastic search , it is 200 MB file ,so i want to use parallel bulk..
file is in this format.
[{},{},{},{}]
basically it is an array of objects.

but when i try to index using parallel bulk, nothing is being indexed to elastic search.
how do i index data using parallel bulk?
Do i need to format the data in any format before i use parallel bulk, if yes please specify the format.

The bulk interface is for sending multiple documents (not necessarily all) in a single request and has to follow the format described in the documentation. It is recommended that bulk requests are limited to around 5MB in size so you should break your data up into multiple requests.

Using this format bulk api works but not parallel_bulk or streaming_bulk, basically i want to bulk index data in chunks of desired size.

What do you mean with this? How are you indexing the data?

please check out parallel and streaming bulk section here https://elasticsearch-py.readthedocs.io/en/master/helpers.html

It would have helped if you explained that you are using the python client in the forst post. Can you show us your code? Are you parsing the input file and treating the objects one by one in the code?

import json

from elasticsearch import Elasticsearch

from elasticsearch.helpers import bulk

from elasticsearch.helpers import parallel_bulk

from elasticsearch.helpers import streaming_bulk

from requests.auth import AuthBase

import requests

import json

requests.packages.urllib3.disable_warnings()

es = Elasticsearch('https://localhost:9200',ca_certs=False,verify_certs=False,headers={'content-type': 'application/x-ndjson','Authorization' :'Bearer P/N1UniTZQ~'})

parallel_bulk(client=es,actions="ldif2.json",index="user_volvo",chunk_size=2)

This is the python code that i used.

the data ldif2.json is in this format.

[{"_index": "abc", "_type": "user", "_source": {"dn": " cn=abc,o=VCC", "changetype": " add", "mail": " abc@corp.com", "surname": " satya2"}} ,{},{},{}.......]

please refer this stackoverflow link,
https://stackoverflow.com/questions/54962685/how-to-improve-parallel-bulk-from-python-code-for-elastic-insert
https://stackoverflow.com/questions/54212958/elasticsearch-python-parallel-bulk-can-not-insert-data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.