Thursday, November 24, 2011

How to process Microsoft Outlook .PST files

Here is an efficient way that FreeEed uses:
  • Convert PST to MBOX formats. Use readpst in Linux and JPST in Windows. Before I used individual EML emails, but this is not so efficient, since there are too many of them. Dealing with MBOX files that correspond to top-level PST folders fits much better with the overall Hadoop processing;
  • Use javamail in conjunction with mstor local access provider to process these MBOX files. This approach is great because it allows to use standard components of high quality. It also gives full access to attachments, CC, BCC, etc.
Now this approach is something I feel very good about, because it combines the best practices with overall efficiency. 

Tuesday, November 22, 2011

Adding image creation to FreeEed

By "image creation" in eDiscovery we mean making the PDF or TIFF images of the originals. Having these is convenient for review, because it eliminates the need for the various applications required to open the native file formats, and is useful for redacting.

In the last three weeks I was starting a new assignment that has to deal with text analytics and understanding in the context of Big Data, which is great, because the deeper knowledge of it will help me create open source tools for automated document review later on. But it also meant that I only had a couple hours to work on FreeEed in the evening, and that only for two evenings.

Nevertheless, this was enough. OpenOffice/LibreOffice are open source free applications that allow printing MS office documents to PDF, and JodConverter is a bridge that allows the code to talk to it. Altogether, printing is done with five lines of code. Here they are:

OfficeManager officeManager = new DefaultOfficeManagerConfiguration().buildOfficeManager();
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
converter.convert(new File("test.odt"), new File("test.pdf");
Taking out the start/stop code, you have just one line:
converter.convert(new File("test.odt"), new File("test.pdf");
That's it! One line of code (and lots of computing power) to convert all MS Office file formats to PDF. Isn't this amazing? Anytime you need more computing power, you get it from the cloud on the cheap, so FreeEed begins to really shine, because it is designed for parallel processing in the cloud.
Sometimes I wish that I would have more time for FreeEed, perhaps even doing it full-time. But then again, since I can do so much with the great open source tools, then maybe it is not even necessary.

Tuesday, November 8, 2011

Evan Koblentz on eDiscovery pricing

"The other extreme is products and services that are cost-free. The open-source FreeEED project may get traction now that it's available for Microsoft Windows. In addition, FreeEDD will soon tackle automated document review, project leader Mark Kerzner said. Open source is sometimes controversial for developers' philosophical approach, and for what closed-sourced vendors allege are hidden costs of implementation and support. But at least you know what the base software costs -- nothing."

Full article

Friday, November 4, 2011

Open source eDiscovery (FreeEed) for Windows is released

All paralegals, lawyers, and do-it-yourselfers!

We have released the version of FreeEed, which runs in Windows. It also runs on a Hadoop cluster, is scalable, and is free. You can find it here,

Thank you. Sincerely,
FreeEed Team

Art: Bruegel, Pieter the Elder - Dance

Thursday, November 3, 2011

So I paid $548.30

...and I own a JPST license from It gives me unlimited distribution rights, so everybody can use FreeEed for free, and it works in Windows and extracts PST. It goes into Release Candidate 2, where all known bugs are fixed. Production release coming as soon as I receive the licencing info.