I have a mbox file with a size of 26 GB (this file is called bigFile.mbox).
I want to index bigFile.mbox in Elasticsearch. How can I do ?
Here's what I tried to put in place:
1.I started writing a python script that converted the bigFile.mbox file to bigFile.json:
The script works correctly when it comes to small files (<= 1MB).
But nothing works correctly when it comes to converting bigFile.mbox which has a size of 26GB
2. I created a python script (splitter.py) that splits an mbox file into several files of a given size:
But this script takes a long time to cut a 26GB file and
generates hugely small 1MB mbox files which makes indexing in
Elasticsearch much longer.
Please, is it the most efficient way to index a 26GB mbox file in Elasticsearch?
I thank you in advance.