How can I Parse/Index PST file to elasticsearch?


(Fardin Behboudi) #1

I am able to parse json file in elasticsaerch. is there anyway to parse/index Microsoft outlooks PST files to Elasticsearch indexes??

thank you very much


(Jason Kopacko) #2

Is there a way to export a PST into CSV or some other plain file format?

If so, you could then read it in via Logstash.


(Fardin Behboudi) #3

No I cant convert it to any file format , i have huge source like several houndred GB and increasing, i need a direct solution .
Thank you for your Idae


(Mark Harwood) #4

Ignoring the basic problem of parsing the file format you'll need to think about how you want to represent all the complexities in an email archive.

Do you want to trim the original text in email replies? Do you need to know if emails were read or not? Do you want the routing headers? Do you also want to parse attachments? What doc types (PDF, Word, Excel...)? Do you want to unzip zipped attachments? Do you just want MD5 signatures of files? Do you want to virus scan?

There's lots of questions. If you Google "PST parser" you'll find several answers for how to physically parse the files but you'll have to answer the above questions on how to prepare all the content for representation inside elasticsearch and write some custom code that implements these choices.

Cheers
Mark


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.