Tuesday, March 8, 2011

Open source eDiscovery restarted

A couple weeks ago I restarted the FreeEed, an open source eDiscovery project. The initial announcement can be found here, but that was two years ago. Now I have changed everything, except for the project name.

The project now is on GitHub, which is the new black for developers. It combines complete processing and has no resemblance to the old version, which relied on Google Desktop search.

Moreover, it is intended to run locally, on a private Hadoop cluster, or on EC2 Amazon cloud. Thus, it can be as small or as big as you want.

Really, it is not at all hard to do this. Look, for example, at this company, talking about open source eDiscovery solutions, and you will see that the choice of components - Hadoop, Lucene, Mahout, Tika, Solr, Nutch - is pretty standard. With the road thus outlined, it was indeed strange that no one has done it so far - so there you, I have done it now.

In the past two weeks I was already able to accomplish about half the work. It is just a matter of combining the right libraries in the right way. Truth be told, the 90-10 rule applies here too: the most fun and usefulness are in the first 90% of the functionality, which takes 10% of the time. The rest are the details.

One can follow the project by joining the Google Group here, or on GitHub, where developers can have free accounts for easy cooperation.

No comments: