Perl script to coun...
 
Notifications
Clear all

Perl script to count # of messages in PST

13 Posts
6 Users
0 Likes
1,161 Views
(@chrism)
Posts: 97
Trusted Member
Topic starter
 

I've written a program in Perl that can count the number of message objects in a Unicode PST file. Useful to quickly discover the number of messages, without opening the file in EnCase or Outlook. A message object is defined by Microsoft as an e-mail message, appointment or contact.

Usage is btree-parse.pl <pst file>.

This program actually parses the complete node tree structure of a PST file, and saves the output in an XML file (The XML file can get quite large, testing has produced a 70MB XML file from a 1.5GB PST file.). It parses both the Block BTREE and Node BTREE structures. Because of this, the program can easily be extended to count the number of attachment objects too.

The program reads the PST file on a binary level, no APIs needed.

The program will eventually be extended to extract the raw messages from a given PST file, with the corresponding folder structure.

Source code is here http//code.google.com/p/pst-parser/.

I have to note that I am not an great Perl programmer, and I apologise if my code is hard to read. If anyone would like to test it - I would love to hear your feedback.

 
Posted : 30/08/2011 8:38 pm
jhup
 jhup
(@jhup)
Posts: 1442
Noble Member
 

Excellent! I have been working on stats of e-mails using some nasty workarounds.

I have not ran it yet but already hoping for the best.

Can you also dump the e-mail header, each in a column/object? (at least to/from/subject/sent/received)

 
Posted : 31/08/2011 12:57 am
(@chrism)
Posts: 97
Trusted Member
Topic starter
 

Hi jhup,

The perl script I have written parses the NBT layer of the PST file, but it is written is such a way that it can be extended to read the message layer. Hopefully once this is done then it will be able to read the header information quite quickly.

Unfortunately the message layer is obfuscated by Microsoft, but they have released the algorithm, so I'm working on a C++ program that will be able to decode it - then I can go from there.

Please do update me on how you got on, how it works on your machine etc

 
Posted : 31/08/2011 1:59 pm
jhup
 jhup
(@jhup)
Posts: 1442
Noble Member
 

I actually dump the PST into an other format, and extract the data that way. Very painful, tedious and prone to errors.

 
Posted : 31/08/2011 8:44 pm
(@chrism)
Posts: 97
Trusted Member
Topic starter
 

Update I have found a way to decode a PST's message layer, I've written a small C++ program that will accept a data file and dump out the decoded version. I've uploaded it (along with the compiled executable) to the Google code page given above if anyone wishes to view it.

The plan is to merge these two programs together, so that the message layer can be fully read and the useful information parsed out quickly (we are talking about seconds for files less than 1GB).

 
Posted : 02/09/2011 8:51 pm
(@hydrocloricacid)
Posts: 37
Eminent Member
 

you may find the following link useful.
It's an opensource program for extracting items from pst/ost files and in the download area he has some doc's on the format of pst/ost files.

http//sourceforge.net/projects/libpff/

 
Posted : 05/09/2011 8:47 am
(@chrism)
Posts: 97
Trusted Member
Topic starter
 

Currently following the written format of PST files on the Microsoft website, took me a while to read it and fully understand it.

From what I can gather (please correct me if I'm wrong), the libpff uses Microsoft's MAPI functions to read PST files. I'm writing my program to not use any API calls, mainly so I can teach myself Perl, but also so that I can make the program as flexible as possible.

 
Posted : 05/09/2011 3:26 pm
binarybod
(@binarybod)
Posts: 272
Reputable Member
 

From what I can gather (please correct me if I'm wrong), the libpff uses Microsoft's MAPI functions to read PST files. I'm writing my program to not use any API calls, mainly so I can teach myself Perl, but also so that I can make the program as flexible as possible.

This confused me at first because I have libpff and the tools installed on my GNU/Linux machine which has no Windows API and never will have.

Digging around the source code I found this in libpff.c
#if defined( WINAPI )
#include &lt;windows.h&gt;
#endif

Without digging around any deeper in the confuguration files and such, it looks as though libpff uses the API if it is available but uses it's own functions if not.

Paul

 
Posted : 05/09/2011 5:20 pm
(@chrism)
Posts: 97
Trusted Member
Topic starter
 

I stand corrected, and thanks for the information ) I will look into libpff in more detail.

 
Posted : 05/09/2011 6:48 pm
keydet89
(@keydet89)
Posts: 3568
Famed Member
 

Chris,

Have you done any more work on this project? I went to the Google code site (and will go back), and was looking for something that described what was there…unfortunately, I don't see anything that describes the code.

Anything you could provide would be immensely helpful…thanks. This is a much needed capability within the community.

 
Posted : 06/04/2012 6:37 pm
Page 1 / 2
Share: