Index json files in bulk

itsron143 · January 17, 2019, 12:13pm

I have a folder with around 590,035 json files. Each file is a document that has to be indexed. If I index each document using python then it is taking more than 30 hours. How do I index these documents quickly?

Note - I've seen bulk api but that requires merging all the files into one which takes similar amount of time as above.
Please tell me how to improve the speed. Thank You.

dadoonet · January 17, 2019, 1:40pm

You can use filebeat may be or have a look at FSCrawler which has a json files importer mode. See https://fscrawler.readthedocs.io/en/latest/admin/fs/local-fs.html

itsron143 · January 17, 2019, 2:26pm

I don't want to index the files. I want to index the json content of the files. I don't see how FSCrawler can help with that or FileBeat

A_B · January 17, 2019, 2:35pm

I have not used FSCrawler but know Filebeat and Logstash that can ship JSON to Elasticsearch.

Data in Elasticsearch is stored as JSON documents. Whatever technical naming convention you use e.g. Filebeat can read your JSON files and ship the content, one document at the time, to Elasticsearch, which will index each compete JSON object as individual documents.

I think Filebeat expects one JSON object per line though, so depends a bit on the format of your JSON files.

I have not tried to do this directly from Python but that should definitely be possible. I have some Python scripts as parts of ingestion pipelines and they have no problems doing thousands of JSON objects per second. But I guess it depends on the complexity and size of the JSON objects you want to index and the hardware resources you have.

dadoonet · January 17, 2019, 5:44pm

That's what I meant as well.

system · February 14, 2019, 5:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load JSON files to Elasticsearch Index Elasticsearch	3	85	July 18, 2024
What is the best and fatest way to insert json data into elasticsearch Elasticsearch	3	2077	October 18, 2019
Import json files to elasticsearch using logstash Logstash	1	1541	February 21, 2019
Getting started with Elastic Search! Elasticsearch	7	520	February 8, 2020
Beginner: How to index json file into Elasticsearch Elasticsearch	2	4592	January 11, 2019

Index json files in bulk

Related topics