By "image creation" in eDiscovery we mean making the PDF or TIFF images of the originals. Having these is convenient for review, because it eliminates the need for the various applications required to open the native file formats, and is useful for redacting.
In the last three weeks I was starting a new assignment that has to deal with text analytics and understanding in the context of Big Data, which is great, because the deeper knowledge of it will help me create open source tools for automated document review later on. But it also meant that I only had a couple hours to work on FreeEed in the evening, and that only for two evenings.
Nevertheless, this was enough. OpenOffice/LibreOffice are open source free applications that allow printing MS office documents to PDF, and JodConverter is a bridge that allows the code to talk to it. Altogether, printing is done with five lines of code. Here they are:
OfficeManager officeManager = new DefaultOfficeManagerConfiguration().buildOfficeManager();
officeManager.start();
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
converter.convert(new File("test.odt"), new File("test.pdf");
officeManager.stop();
Taking out the start/stop code, you have just one line:
converter.convert(new File("test.odt"), new File("test.pdf");
That's it! One line of code (and lots of computing power) to convert all MS Office file formats to PDF. Isn't this amazing? Anytime you need more computing power, you get it from the cloud on the cheap, so FreeEed begins to really shine, because it is designed for parallel processing in the cloud.
Sometimes I wish that I would have more time for FreeEed, perhaps even doing it full-time. But then again, since I can do so much with the great open source tools, then maybe it is not even necessary.