Thursday, August 28, 2008

What is wrong with eDiscovery

The article in the Economist
  1. Argues that US legal system is already a "sick patient", and eDiscovery threatens a lethal "spike in fever";
  2. As an example, talks about a discovery request to produce the MySpace, Facebook, and chat records of teenage girl patients, related to a medical insurance lawsuit;
  3. Explains that many foreign countries limit the discovery due to their inquisitorial system, compared to the US adversarial system;
  4. And suggests that the judges should limit the amounts of eDiscovery.
I could not agree more, and I have seen opinions that the new FRCP rules have failed to stem the deluge of eDiscovery. However, these are lonely voices, and I do not see the easement coming soon. Currently, most of the effort is in coming to grips with eDiscovery, both for lawyers and judges, and not it finding what is wrong and fixing it.

Tuesday, August 26, 2008

litsupport summary for the week ending on 8/24/08

A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. What are the ways to destroy information on a hard drive before disposing of it?
A
. DBAN, EBAN, sledgehammer, blowtorch, gun (for those so inclined), sanding the face of each drive plate, drilling through the drive, commercial hard drive destructors, crashers, and shredders, for fun search youtube for harddrive and thermite.

Q. A simple way to capture video, such as from a bank surveillance tape to convert it to AVI?
A. Quick Media Converter, VideoHelp, VLC, 3GP, M1 Edit for capture and simple editing, remember to export in the correct format per platform you'll be using in court, Camtasia Studio Screen Recorder, ED-Video.

This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner (mkerzner@top8.biz) and edited by Aline Bernstein (aline.bernstein@gmail.com).
A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. Can one use disposed (such as sold on eBay) hard drives for forensics training?
A
.

This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner (mkerzner@top8.biz) and edited by Aline Bernstein (aline.bernstein@gmail.com).

Monday, August 18, 2008

litsupport summary for the week ending on 08/17/08

A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. Can one use disposed (such as sold on eBay) hard drives for forensics training?
A
.
Yes:
  1. Great for "free range" data to play with;
  2. This is a routine practice with many CCE examiners;
  3. Software licenses are not a problem - do not use the software on the driver, and private data is not a problem - study it for technical reasons but do not use it;
  4. The ethical responsibility is to do good work for the client;
  5. The person who sells the drive has lost his right to data privacy;
  6. There is no case law against this practice;
  7. This is no different from Google reading your gmail and automatically showing you the relevant ads.
No:
  1. You can not post recovered files, because they are not in the public domain;
  2. This is legally questionable, possibly as conversion or identity theft, and ethically bankrupt, because lawyers are held to higher ethical standard, and because it makes a vendor look dirty;
  3. You own the drive but not the data, and you may have a stolen drive;
  4. Used drives do not work well, and one should not sell them anyway, but destroy them instead, and this is the advice to give to clients;
  5. "Nobody knows" is not an excuse, instead, spend the money and prepare legal data sets;
  6. The person who sold the drive did not realize that she was giving away her data, so it is stealing, and most people don't know that formatting their hard drive doesn't protect their data;
  7. Testing the tools on unknown data sources does not validate the tools anyway.
Q. How to track an internet site poster given the poster's IP?
A.
  1. From the IP you can find out the provider detail using software such as PtWhoIs, then you subpoena the provider (sometimes through a John Doe lawsuit) to help you determine the physical address where that IP address was issued. A forensic exam of the computer at the physical address may turn up the remnants you are looking for to ultimately prove what computer was used to make the post;
  2. Road bumps above may include dynamic IP lost after 30 days, wireless router which was used by someone else driving by, spoofing or hacking the IP, or anonymous IP using any anonimizer service;
  3. An article on this and RIAA practices;
  4. New research deals with data preserved in computer memory for a long time (forensics side) and with author probabilistic identification based on comparison to corpus of known email from the user;
  5. In one practical case the combination of ISP information with linguistic analysis led to admission, and no forensics exam was required;
  6. Voluntary disclosure of information on a public website falls outside of any privacy protections one would want to later claim. It is one of the few exceptions to the Stored Communications Act (if you post the information, you cannot be protected from privacy of who you are.)

This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner (mkerzner@top8.biz) and edited by Aline Bernstein (aline.bernstein@gmail.com).

Tuesday, August 12, 2008

BYOS - Build Your Own Startup - on the Cloud! - Issue 3

The geek world is aglow, but business world is cool

Ask any geek what is new in the world today, and he may drown you with excitement about cloud computing. But ask a technical manager about it, and he will cool you with his talk about the total cost of ownership. Ask a business man, and you may get a "huh?" We are forced to admit that there is a disconnect between the geeks and the business world. Why is this - that is what we have set out to investigate. Google App Engine is the subject today.

B. Pleemo, you look very mysterious today. And why did you invite me to the zoo?
P. Well, Bernard, I could not wait for YOU to come up with an idea, so to spur you on, I invented one myself. To implement it, I will need python. I do not know if this snake is a python, but my Python is a programming language used by Google App Engine.
B. Firstly, I accept the challenge. I do not know what you have in mind with your idea, but I will show you mine at our next meeting. Now continue with yours!
P. Okay, Bernard, as you may remember, I am Jewish, and I had an idea of building a web site where everybody can set Jewish calendar reminders. There is a site or two like it, but my idea is to make it unaffiliated, very flexible, and individualistic, so every person or a group of people can use the site in their personalized way.
B. So why the cloud?
P. I do not expect to make money on this, so I can not spend much. However, I have to plan for an eventuality that it will become popular, and I need my site to be able to handle occasional peak loads.
B. I see...
P. Enter Google App Engine! It allows to upload your application to the Google Cloud. It is free until you get to 5 million page views a month. And Google scales it for you on demand. I have already started my application right here! Not much, admittedly, but give me two weeks - for Python is new to me - you see however some exchanges there and even some hackers.


B. Pleemo! I can tell you the character of the Google Cloud right away myself: it is opaque, simpler to use, and it is free. It fits your purpose perfectly well. You have beaten me to the business idea. My only consolation is that it is not properly business.
P. Bernard, you are a great student of mine. Note, also, that Google makes you use "datastore". It is simpler than a database and it too has no limit.
B. Thank you Pleemo! I will study the links you gave me, and watch for your idea's progress - since here we are back to discussing the speed of development and the risk of failure.
P. Good, Bernard, and if you really want to study, then google about Dell trying to acquire the 'cloud computing' trademark, or read the explanation of cloud vs grid vs distributed computing on Mark's blog. Arrivederci in two weeks.







I enjoyed the trip to the zoo, where I have not been since my youngest children stopped asking for it, and I am curious to see what will happen in two weeks.
My reporting is getting easier, and my homework lighter.

















Monday, August 11, 2008

Technology for Lawyers and Paralegals: Evidence Authentication - Word Documents

Question

Electronic evidence presents unique authentication challenges. What are the specific issues for MS Word files?

Judge Grimm on Evidence


On May 4, 2007, Chief U.S. Magistrate Judge Grimm provided a detailed analysis of evidentiary issues associated with electronic evidence.

As Electronic Discovery Law explains, in Lorraine v. Markel Am. Ins. Co., 241 F.R.D. 534 (D. Md. 2007), the parties filed cross-motions for summary judgment but failed to comply with the requirement of Rule 56 that they support their motions with admissible evidence. Chief United States Magistrate Judge Paul W. Grimm denied both motions without prejudice to allow resubmission with proper evidentiary support.

In his memorandum opinion, Magistrate Judge Grimm remarks that "considering the significant costs associated with discovery of ESI, it makes little sense to go to all the bother and expense to get electronic information only to have it excluded from evidence or rejected from consideration during summary judgment because the proponent cannot lay a sufficient foundation to get it admitted."

Technical homework

The following discussion uses MS Word documents as an example, but it is applicable to most other documents types.

There are two types of data: document data (words, formatting, etc), and metadata, or data about data. Furthermore, there are two types of metadata: application metadata (title, author, last saved by, etc., even custom fields), and OS metadata (file creation date, file last modified, and more).

To verify the document data together with its metadata, it is possible to compute and record the document's MD5 or SHA signature. Since the application metadata is stored in the file, it too will go into the hash calculation.

For the OS metadata, one can rely on the collection procedure, which hopefully has been done with a validated tool for the best defensibility.

Lacking this, one may go back to the original and use a safe metadata viewer to pull the original OS info (assuming that it has not been modified in the meantime). One can use a tool like Pinpoint Metaviewer to create screenprints for presentation.

For file system metadata one can create a container (such as a zip file) with all the authenticated files together, and compute the signature of the archive. If the files have not been moved or touched or opened, this will preserve the OS metadata.

The signatures thus collected can be used to prove that the evidence has not been tampered with. One can ask the opposing side for the same hashes, and if they agree, there is no argument that both are looking at the same evidence.

Legal approach

To quote from Judge Grimm,

"Authentication also can be accomplished in civil cases by taking advantage of FED. R. CIV. P. 36, which permits a party to request that his or her opponent admit the "genuineness of documents."

If the other side has some doubts about the possible changes in the document, your file hash will prove that both the document and the metadata are intact.

Furthermore, "...at a pretrial conference, pursuant to FED. R. CIV. P. 16(c)(3), a party may request that an opposing party agree to stipulate 'regarding the authenticity of documents,' and the court may take 'appropriate action' regarding that request."

For example, this may be a case where this document has already been under discussion. More generally, once each counsel has exchanged a description of ESI held by a party, one topic for the "meet and confer" can be some form of agreement as to authenticity or at least some stipulation as to what must be done to avoid objections on this basis.

"Similarly, if a party properly makes his or her FED. R. CIV. P. 26(a)(3) pretrial disclosures of documents and exhibits, then the other side has fourteen days in which to file objections. Failure to do so waives all objections other than under Rules 402 or 403, unless the court excuses the waiver for good cause. This means that if the opposing party does not raise authenticity objections within the fourteen days, they are waived."

This is a very important and easier path, since no action is required from the other side.

If the other side produced the document, then, absent special circumstances, this is tantamount to the admission of authenticity.

So far, we have used the FRCP rules. Of course, the arguments are strengthened by proper collection procedure and by availability of hash signature for verification, but strictly speaking these are not required.

The following, more technical, scenarios can use the hashes discussed above:

  1. Presence of the same document (as authenticated by application hash) as an attachment in email from this custodian;
  2. Presence of the same document (application hash) on another computer or laptop belonging to the same custodian;
  3. Presence of the same document (both application and OS hash) in a backup. In this case you will need somebody to testify about the backup procedures.

Note: Legal information is not legal advice. Top8 provides information pertaining to business, compliance, and litigation trends and issues for educational and planning purposes. Top8 and its consultants do not provide legal advice. Readers should consult with competent legal counsel.

The author gratefully acknowledges the editing help and numerous suggestions of Kelvin Rocquemore, Esq., of Trial Solutions.

The author is also thankful to his colleagues at the litsupport discussion group, whose discussions provide him with much inspiration and knowledge.

litsupport summary for the week ending on 08/10/08

A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. Is there a QuickBooks Viewer?
A
. One can download "Simple Start" application from Intuit website (search there). It's free and should open QuickBook files. For proper chain of custody, one can try to make the file read-only or at least keep a backup version.

Q. How can one study Summation?
A
. One can request an evaluation copy from the web site, if granted, it will be valid for a year; there is a "Lawyer's Guide to Summation (Paperback)", published 2004; and webinars here.

Q. How to authenticate MS Word docs as evidence?
A.
  1. Hashing the file objects will give you their object metadata and content. For file system metadata encapsulate both file objects into an archive (ZIP, RAR, TAR, ISO) and hash the archive;
  2. For the OS info, data should have been collected with a validated tool for the best defensibility. If not, you could go back to the original and use a safe metadata viewer to pull the original OS info (assuming that it has not been modified in the meantime). One can use for example Pinpoint Metaviewer;
  3. Look at Judge Grimm's opinion in Lorraine v. Markel Amer. Ins. Co., 241 F.R.D. 534 (D. Md. 2007);
  4. Summarized and expanded upon in a newsletter here.

Q. What is near-deduplication and how reliable is the process?
A
.
  1. Near-duplicate identification is using a similarity measure for grouping versions of an item, applicable to finding almost identical versions of email or MS-Word doc and other documents. It is useful in investigations, and for consistency of review;
  2. Near duplication breaks documents into overlapping shingles of a certain length. A shingle is a sequence of words (or letters) starting with the first word in a file and then starting with the second word, and so forth.The common algorithm then chooses a sample of these shingles from each document using a rule that is likely to yield the same shingles from different documents (if they are present). Simplifying a bit, the probability that two documents are near duplicates is the proportion of the sampled shingles that are shared by the two documents. See more here and here.
  3. There's no such thing as "reliable" near de-duplication. The entire science is subjective and prone to error :)
  4. Although near-dupes are not recommended to bulk code, but the foundation methods of Equivio, Attenex, Syngence, etc are just as scientific/repeatable as full text search for keywords. Every tool has an appropriate use;
  5. Google has a patent for "Method and Apparatus for Estimating Similarity." Google needs it in order not to list in the search results essentially the same pages (as some people use this to direct traffic to their sites). Compared to bottom-up methods described above, Google patent is top-down in that it generates sketches of objects being compared, and similarity is based on these sketches.
This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner (mkerzner@top8.biz) and edited by Aline Bernstein (aline.bernstein@gmail.com).

Monday, August 4, 2008

litsupport summary for the week ending on 08/03/08

A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. What are the recommended forensics certifications for legal work?
A
. CCE is a non-vendor certification, focusing on methodology, terminology, documentation and standards and will come in very handy when in court. Furthermore, it is a PI requirement in some states; at least one product certification, such as EnCE, FTK, X-Ways, Pro-Discover, and GCFA for incident investigation.

Q. Is it a good or a bad idea to use OCR-based searching for first-pass privilege review in lieu of page-by-page review?
A.

GOOD:
  1. Good for a first-pass priv review. Segregate the hits and their associated family docs into a "potentially privileged" review set for 1 or 2 atty's to eyeball. Be careful with search terms: searches for a laundry list of atty names and law firm can be over-inclusive. Thus, as an overall strategy to reduce risk at the outset, it's a good idea;
  2. Great but it depends upon the OCR. Extracted text - yes, paper OCR - no. Use the OCR searches to help your review, but not as a "first pass priv. review."

BAD:
  1. Probably a bad idea if they are going to perform a page-by-page review of only those documents brought back by the search and all other documents will be assumed to be non-privileged and will be produced without a page-by-page review. It is DEFINITELY a bad idea if there is no clawback agreement. This situation was dealt with in Judge Grimm's decision in Stanley v. Creative Pipe and resulted in a waiver of privilege;
  2. The OCR search won't necessarily find potentially privilege documents with client (or attorney) handwritten notes.
This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner (mkerzner@top8.biz) and edited by Aline Bernstein (aline.bernstein@gmail.com).