Is it possible to index a file with a size of 26GB in elasticsearch?

Michel_Budimbu-Esuku · April 27, 2018, 12:42pm

Hello everybody
I have a mbox file with a size of 26 GB (this file is called bigFile.mbox).
I want to index bigFile.mbox in Elasticsearch. How can I do ?

Here's what I tried to put in place:

1.I started writing a python script that converted the bigFile.mbox file to bigFile.json:

The script works correctly when it comes to small files (<= 1MB).
But nothing works correctly when it comes to converting bigFile.mbox which has a size of 26GB
2. I created a python script (splitter.py) that splits an mbox file into several files of a given size:
But this script takes a long time to cut a 26GB file and
generates hugely small 1MB mbox files which makes indexing in
Elasticsearch much longer.

Please, is it the most efficient way to index a 26GB mbox file in Elasticsearch?

I thank you in advance.

Best regards
MBE

dadoonet · April 29, 2018, 1:23pm

You probably don't want to index one mailbox as a whole document but you probably want to index every single email which is in the mailbox.

My guess is that you want to be able a find an email and not a mailbox which contains XYZ.

So you need to find a way to read every single email and send it to elasticsearch. Then you can use may be ingest-attachment plugin to index each email.

My 2 cents.

Michel_Budimbu-Esuku · April 30, 2018, 8:48am

Hello dadoonet,
do you know an effective way to index each email that is in a mailbox?

dadoonet · April 30, 2018, 9:02am

No I don't.
May be Tika project has some code related to parsing mailbox content. It's in Java though.

Michel_Budimbu-Esuku · April 30, 2018, 9:09am

thank you very much :)

system · May 28, 2018, 9:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fscrawler index large file Elasticsearch	11	797	May 18, 2018
How to index a large file with Elasticsearch Elasticsearch	2	1637	July 5, 2017
Indexing 5GB file Elasticsearch	8	1013	May 29, 2018
Indexing a file of > 1GB Elasticsearch	6	3785	July 5, 2017
Elasticsearch : cannot bulk index file larger than 6mb Elasticsearch	1	593	August 21, 2017

Is it possible to index a file with a size of 26GB in elasticsearch?

Related topics