Thursday, March 29, 2018

FreeEed 7.7.1 release

Here is what is new is FreeEed 7.7.1 release

  • Restored deduplication
  • Better email handling
  • Separated processing engine code into its own project
  • All UI forms done in IntelliJ, out of NetBeans and away from commercial editors

Tuesday, September 12, 2017

Does FreeEed search for numbers? - Yes, it does!

This question was asked by one of the users, can he find numbers in the text that FreeEed indexes. I got curious myself and checked.

The reason that this is an important question is that I remember Craig Ball mentioning that in one of the requirements for good eDiscovery software. So OK, I ran a few searches and found out that out-of-the-box FreeEed does index all numbers. That felt good, and I am attaching the screenshots of the experiment.

Of course, that is not a special property of FreeEed but of Tika, Lucene, and SOLR. It's these components that are responsible for what FreeEed indexes.

Had this not been the case, I would tweak the use of the components, but luckily this was the way FreeEed already uses them. The advantage of passing through to these libraries is that the users can rely on the well-known Lucene syntax to do their searches.

Monday, July 31, 2017

Couchbase at Houston Hadoop & Spark Meetup

Justin Tuggle presented the well-justified reasons why today only NoSQL databases are up to par, to provide customer engagement and means for business survival, and of them, why Couchbase is to be preferred.

Here is the link to the materials.

Thursday, July 20, 2017

An easy way to run FreeEed on Amazon

Running FreeEed on Amazon is very easy and offers some substantial benefits.

  1. You can get a fully provisioned server in a minute
  2. You can get any size of hard drive and a large number of CPU
  3. It is as easy as using your desktop.
To start the server, find this AMI in the Oregon region on EC2: ami-e6acbf9f.

After you start the service, open the assigned IP in any browser. You will see a screen like the following below

Click on the 'vnc.html'. You will see the login screen

After you log in, you will see a full Ubuntu desktop, where you can do any work. FreeEed is already installed.


Sunday, July 9, 2017

eDisco and Open Source Software

Today I am starting a series of blog posts on how to do eDiscovery with open source software. I will base it initially on a wonderful book "Project Management in Electronic Discovery". The advice that I will give will not be limited to FreeEed, but it will draw on the complete range of Open Source, Data Science, etc.

Every eDiscovery person has her or his own set of tools, and I hope that these articles will add to your library. Let's organize those docs!

(Image source:

Saturday, July 8, 2017

New use cases for FreeEed

Today we release early preview of FreeEed with the following use cases
For the plaintiff.

If you ask for the eDiscovery documents, you might eventually get them. Now, what do you do with them? 

The answer that FreeEed gives you is "Use the load file as the data source." That is, FreeEed allows you to load the documents you were sent and start reviewing them. 

For the researcher

Perhaps not directly related to eDiscovery, but people do you FreeEed for various research purposes. For example, at DARPA they loaded the court documents obtained from the NY Court of Appeals website and added some annotations (tags). Now, to do data analytics on the set, they need to export the documents back, with the new tags. This is provided in the option "Export the load file," which will export either the full set, with the annotations, or the current search results.

For the techie

Sometimes your eDiscovery or other data is in the form of a JSON file. JSON format is popular because it is flexible and allows to define your fields. In fact, you can change the fields from record to record.

This is provided now with selecting "JSON" as an input format, with the option "Use the load file as the data source." 

Likewise, you can import any CSV file.

Other improvements include

* Implement extensive continuous testing with Jenkins (
* Review - quick preview now working

Friday, May 26, 2017

Sub-second SQL queries with LLAP from Hortonworks

Houston Hadoop & Spark Meetup in April was graced by the presentation from Ravi Mutyala of Hortonworks. Here are the slides, Please refer to Ravi for further questions.