Thursday, February 9, 2017

FreeEed for eDiscovery response and for general research

FreeEed is a popular open source eDiscovery tool. It boasts over 1,000 users, has active projects in major consulting companies and is popular with researchers. However, it often needs to be used upside down. Here is what I mean.

In regular eDiscovery, you input directories, and FreeEed processes them, giving you these outputs

  1. "Load file," or a CSV file with the metadata, one line per document or email.
  2. "Output file," a zip file containing native documents, extracted text, PDF images of all files, and exceptions, each in its folder.
  3. Case for review, loaded into FreeEedUI review tool. It is put into SOLR as a back end, but for review, one uses the FreeEedUI.
However, there are two use cases that would require the opposite: reviewing the eDiscovery response, and using FreeEed for research.

Reviewing the eDiscovery response

If you send an eDiscovery request, you may get back the load file and the documents. In essence, you are getting the data in the same format that FreeEed outputs it. What you would like then is to reverse the process, to make the load file the input, and to index the documents for search. This is now implemented in FreeEed.

When you select the input, you see a "Data Source" panel. If you choose eDiscovery, FreeEed will work as before, that is, accepting your custodians' files as input.

If you choose the "Load file" radio button as a data source, the program will do the following
  • Read each line of the load file
  • For each line, use the given fields as metadata
  • Make the metadata and the extracted file text searchable and create a case in FreeEed for review
  • Available in FreeEed V 7.3
This use case lends itself very nicely to parallelization, and can, therefore, be processed on a Hadoop cluster, to accommodate large volumes.

Using FreeEed as a research tool

Often, researchers already have the metadata extracted. For example, in our Memex court document investigation, we already have elaborate parsing code that extracts metadata from the court documents. In this case, we want to be able to load the metadata and the file text into FreeEedUI for research. We should be able to answer questions like
  • How many times was a given crime mentioned?
  • Repeat the question above for the particular judge and in a specific time range (this questions will search metadata in a structured way, as well as text).
Clearly, this is the same use case as above. The only difference is that we need a different set of metadata fields than the one used in FreeEed by default. Technically, this amounts to programmatically changing the schema in SOLR, and this will be done in the next update, V 7.4.

No comments: