How can I Parse/Index PST file to elasticsearch?

I am able to parse json file in elasticsaerch. is there anyway to parse/index Microsoft outlooks PST files to Elasticsearch indexes??

thank you very much

Is there a way to export a PST into CSV or some other plain file format?

If so, you could then read it in via Logstash.

No I cant convert it to any file format , i have huge source like several houndred GB and increasing, i need a direct solution .
Thank you for your Idae

Ignoring the basic problem of parsing the file format you'll need to think about how you want to represent all the complexities in an email archive.

Do you want to trim the original text in email replies? Do you need to know if emails were read or not? Do you want the routing headers? Do you also want to parse attachments? What doc types (PDF, Word, Excel...)? Do you want to unzip zipped attachments? Do you just want MD5 signatures of files? Do you want to virus scan?

There's lots of questions. If you Google "PST parser" you'll find several answers for how to physically parse the files but you'll have to answer the above questions on how to prepare all the content for representation inside elasticsearch and write some custom code that implements these choices.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.